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Statistical thinking will one day be as necessary for efficient 
citizenship as the ability to read and write. 


H. G. Wells (1866-1946) 


For centuries, the general scientific and philosophical view was that all the 
variability in a phenomenon could be explained if only all the factors 
which cause the variation could be identified: it was believed that, once all 
relevant factors had been identified, a model could be developed to explain 
the variation and to predict what will happen. However, in many 
situations, to identify all the variables involved and to develop a model 
which includes them all can be a complicated if not impossible task. 


By the 18th century, the need for models which incorporate uncertainty 
was recognised. For instance, Gauss (1777-1855) and Laplace (1749-1827) 
both recognised the usefulness of using the notion of chance to model 
imprecision in measurements. Since that time, models incorporating 
uncertainty have been applied in many different fields — insurance, nuclear 
physics, genetics and astronomy, to name a few. Models involving chance 
have also been developed for the spread of an epidemic and the behaviour 
of a queue. 


The title of this block is Modelling uncertainty. In Chapter D1, we begin 
by considering a concept which is fundamental to all models for chance 
events and which underpins statistical thinking: probability. In 

Chapters D1 and D2, two models for the variation observed in a variable 
are discussed — one model for a discrete variable and one for a continuous 
variable. A key factor in the development of statistical thinking was the 
desire to use information gained from a sample to make inferences about 
the population from which the sample was taken. Chapters D3, D4 and D5 
are all concerned with drawing inferences about populations from samples 
of data; each chapter looks at one type of statistical investigation. 

Chapter D3 looks at estimating an unknown quantity; Chapter D4 
investigates differences between populations by comparing samples of data; 
and Chapter D5 is about looking for relationships between variables. 


The first step in any statistical investigation is to specify its purpose and 
pose a precise question. Once this has been done, relevant data are 
collected. You will recognise these as two aspects of the first stage of the 
modelling process: specifying the purpose. The next step is to analyse the 
data — this is the ‘doing the mathematics’ stage of the modelling process — 
and then the results are interpreted. Chapters D3 to D5 concentrate on 
posing a precise question, analysing the data and interpreting the results; 
you will not be asked to collect the data yourself — the data are provided. 


When analysing data that have been collected, statisticians tend to use 
software which has been designed for this purpose. Mathcad is not 
designed as a statistical analysis tool, so in this block, instead of using 
Mathcad you will be using the statistics software package which has been 
supplied to you with the materials for this block. If you have not already 
installed the software on your computer, then you will need to do so before 
you begin the computing section in Chapter D1. One module of this 
software package, called StatsAid, consists of consolidation material on 
some of the statistical topics which were covered in the preparatory 
materials. If you are unsure of any of the ideas discussed there, then you 
may find it helpful to work through the relevant material on the computer. 


You do not have to work through this material: it is resource material 
which you can use if you need further practice. You will find instructions 
for its use in an appendix to Computer Book D. 


In this block, you will not just be involved in modelling uncertainty, you 
will also be putting together some of the skills you have been developing 
throughout the course — your skills of communication and communicating 
understanding. The work you have done in the course so far has involved 
you in communicating understanding for two main reasons: 


© to help you to learn, to clarify and to refine your own ideas; 
© to communicate effectively to other people. 


Many opportunities have been presented for you to think about how you 
are doing and learning mathematics, and some of these have involved. 
communication: for example, saying mathematics out loud to yourself, 
writing summaries to help clarify your ideas, talking difficulties through 
with other students, and so on. Being able to communicate mathematical 
and technical concepts to others (both experts and non-experts) has also 
been a necessary requirement, and you may have found this aspect a 
valuable aid to learning and understanding. As you work through this 
block, you will have the opportunity to get involved in various methods of 
communication. These are all professional skills in their own right, but 
they should also help you to enhance your own understanding, because the 
process of preparing one or more of these forms of communication forces 
you to organise, clarify and understand the significant aspects of a problem 
and its solution. 


Chapters D1 and D2 introduce ideas and give you opportunities to think 
about and practise discussion work and giving presentations. In 
Chapters D3 to D5, you can get involved in thinking about how to 
structure a written report. 


You should schedule four study sessions for your work on 
this chapter, of which the second will use the computer. 
Each section of this chapter depends on ideas and results from 
the preceding sections, so we recommend that you study the 
sections in order. A possible study pattern is as follows. 


Study session 1: Section 1. 


Study session 2: Section 2. You will need access to your 
computer for this session, together with the statistics 
software package and Computer Book D. 


Study session 3: Section 3. 
Study session 4: Sections 4 and 5. 


If you follow this study pattern, then you will probably find 
that Sessions 2 and 4 take longer than Sessions 1 and 3. 
You may prefer to end with a shorter study session 
(particularly if you have met the idea of probability before), 
in which case an alternative study pattern is as follows. 


Alternative study session 1: Section 1. 
Alternative study session 2: Section 2 (computer). 
Alternative study session 3: Sections 3 and 4. 
Alternative study session 4: Section 5. 


If you follow this study pattern, then the second and third 
sessions will probably be long ones and the last will be quite 
short. 


Another possibility is to study Section 3 and Subsection 4.1 
in one session, and Subsection 4.2 and Section 5 in another. 


It is remarkable that a science which began with the consideration 
of games of chance should have become the most important object 
of human knowledge. 


Théorie analytique des probabilités 
Pierre Laplace (1749-1827) 


Chance has been an accepted part of life for many thousands of years. 
The fears and uncertainties associated with people’s experiences of birth, 
sickness and death, wars, earthquakes, droughts and floods have helped to 
shape their sense of the unpredictable nature of these life events. 
Throughout recorded history, people have sought explanations for these 
sorts of chance events, and linked them to a variety of beliefs and 
superstitions, many of which continue to flourish today. 


A concept which is essential for modelling chance events is that of 
probability: a probability is basically a number which measures the chance 
of an event occurring. The desire of some gamblers to analyse various 
games of chance, particularly ones involving dice, provided the stimulus for 
the efforts which eventually led to the development of the fundamental 
ideas of probability theory. Once developed, these ideas were very rapidly 
applied in many other fields. 


In this chapter, some fundamental ideas about probability are introduced, 
and some of the original problems which prompted the development of 
probability theory are described and analysed. You will see how these 
ideas can be applied to model some other situations involving an element 
of chance: for example, the births of boys and girls. 


Section 1 begins with some early history of games of chance, and then the 
probability of an event is defined. Before any ways of calculating 
probabilities are introduced, you will be asked to consider a number of 
questions and to record your ideas about the chances of various events. 

In the computer section which follows, you will be invited to explore some 
of these questions and to compare the results you obtain with the ideas 
you noted when using only your intuition. You will also be asked to make 
hypotheses about a number of the questions on the basis of your 
explorations using the computer. In the remaining sections, some basic 
rules of probability will be introduced and then applied to answer the 
questions raised in Section 1. You will be able to compare your intuitions 
and hypotheses with the results obtained using probability theory. 


There are two main Learning File themes for this chapter. The first relates 
to discussing your ideas and how you might develop discussion skills for 
different groups of people. In discussion groups you may well be involved 
in talking your ideas through, convincing other people, answering 
questions and asking them, and it can be helpful to think about how you 
might work with different groups of people and deal with different topics. 
Selected learning file activities give you the opportunity to get started on 
this important skill. : 


The second theme extends work you have already done on language 

and notation — but here relating to probability. As you have already 
experienced, in order to understand mathematical expressions and 
statements, you need to be able to ‘read’ them, that is, to translate the 
symbols on the page into words. To help you in this task, you may find it 
useful to make a list of new notation and its meaning as it arises, together 
with any suggestions for reading it given in the text. You may also wish to 
use any strategies that you adopted in Block C relating to reading 
notation, perhaps refining and developing them. 


How did the development of a theory for quantifying chance come about? 
How is chance measured? Are your intuitions about chance events reliable? 
These are the questions which underlie the material in this section. 


In Subsection 1.1, some early history of games of chance is discussed 
briefly. A definition of the probability of an event — a number which 
quantifies how likely an event is to occur — is given in Subsection 1.2. And 
in Subsection 1.3 you are asked to use your intuition and experience to 
propose answers to a number of problems involving chance. 


1.1 Games of chance — some history 


Games of chance have been around for a very long time. Boards and 
counters dating back to 3500 BC have been found in Egypt. There are 
tomb-paintings which suggest that some games from that era involved 
moving counters on the throw of an astragalus. (An astragalus is a bone 
from the heel of an animal.) The painting in Figure 1.1 shows a nobleman 
in the after-life using an astragalus in a board game. 


Photograph reproduced 
courtesy of the Oriental 
Institute of the University of 
Chicago. 


Figure 1.1 Ancient Egyptian wall painting — tomb of Neferronpe 


The shape of an astragalus is such that when it is thrown, it can land in 
one of four positions (it has four fairly flat sides). The astragalus was 
almost certainly the forerunner of the six-sided die of later times which 
eventually replaced it. Astragali and dice were both in common use for 
many centuries. Early dice were roughly hewn and uneven in shape, and 
no two astragali were the same. So experience gained using one astragalus 
or die could not be used to predict reliably how another might behave. 
Thus, for a long time there was no impetus for developing a theory to 
explain the nature of such chance events as the throw of an astragalus 


The Liber de Ludo Aleae 
(Book on games of chance) 
was found among Cardano’s 
papers after his death, but 
was not published until 1663, 
about a hundred years after 
it was written. 


If one event occurs, on 
average, four times for every 
three occasions on which a 
second event occurs, then we 
say that the odds are four to 
three in favour of the first 
event. 


These letters also contain 
discussion of a number of 


other mathematical problems. 


or the roll of a die. A sketch of an astragalus from a sheep is shown in 
Figure 1.2, and a six-sided die in Figure 1.3. 


Figure 1.2 Astragalus from a sheep Figure 1.3  Six-sided die 

It is not possible to say when gambling originated, but gaming, that is, 
gambling on the outcomes of games of chance, was widespread in the 
Roman Empire; it was the common recreation of the time among all 
sections of society. In the centuries which followed the fall of the Roman 
Empire, gambling and games of chance continued to flourish in Europe, in 
spite of the vigorous opposition of the Christian Church. By this time, dice 
were well made and so some of the most inveterate gamblers must have had 
some intuitive idea about the relative chances of the different outcomes in 
the dice games they played. However, throughout this period, the Church 
regarded secular learning with deep suspicion, and it was not until much 
later that serious attempts were made to understand and quantify chance. 


When a die is rolled, there are six possible outcomes — the scores 
1,2,...,6. There is evidence that an understanding of the concept of 
equally-likely outcomes on the roll of a die had been achieved by the 15th 
century, and from that time on, efforts were made to explain differences 
that were observed in the relative frequencies of the various outcomes in 
dice games. In the 16th century, Girolamo Cardano, a scholar and 
gambler, made the step from observation to theory. He wrote the following 
about the outcomes of the roll of a die in his book Liber de Ludo Aleae. 


One-half the total number of faces always represents equality; thus the 
chances are equal that ... one of three points will turn up in one 
throw. For example, I can as easily throw one, three or five as two, four 
or six. The wagers therefore are laid in accordance with this equality. 


In later chapters of the book, he discussed the results of rolling two and 
three dice, and went on to calculate the odds of various outcomes. He also 
made calculations for a number of card games. 


The beginnings of modern probability theory are often attributed to the 
two French mathematicians Blaise Pascal and Pierre de Fermat. Between 
1654 and 1660, they corresponded about a number of mathematical 
problems, including several concerning analysing odds in games of chance. 
This correspondence seems to have started when the Chevalier de Méré, a 
French nobleman and enthusiastic gambler, consulted Pascal about a 
problem that had arisen in a game of chance. Essentially, the question was 
about how the stakes should be divided between two players when their 
game is interrupted before either player has obtained the number of points 
required to win; this problem is often referred to as the Problem of Points. 
Solutions to this and another problem posed by de Méré can be found in 
the letters between Pascal and Fermat which survive to this day. 


All the early efforts to solve problems about games of chance were 
complicated by the absence of the idea of using a probability to measure 
the chance of an event occurring. The work that had been done on such 
problems was fragmentary, and there was no established method for 


tackling them. The arguments used in solutions were often lengthy, and 
frequently involved much use of ratio and proportion in order to calculate 
odds. The idea of a probability seems to have emerged in the latter part 
of the 17th century. By the time that James Bernoulli wrote his treatise 
Ars Conjectandi (The Conjectural Art) in the 1680s and 1690s, it appears 
to have become an accepted idea. With the advent of probability, many 
problems that had been complicated to solve using odds became relatively 
straightforward, and the explosion in the development and application of a 
more general theory of chance dates from this time. 


1.2 What is probability? 


The essence of a chance event is that you do not know whether or not it 
will happen. Nevertheless, in one particular sense, chance events can be 
regarded as predictable! Suppose, for instance, that a coin is to be tossed a 
large number of times. Although you cannot say whether it will land heads 
up or tails up on any particular toss (you can only guess), you can predict 
fairly accurately the proportion of times that it will land heads up. Since 
there is no reason to believe that either of heads or tails is more likely to 
occur than the other, you would expect the coin to land heads up 
approximately half of the time. 


Table 1.1 shows the results of the first 8 tosses in a sequence of 30 tosses of 
a pound coin. The second row of the table shows the outcome of each toss 
—h for heads and ¢ for tails. The third row shows the total number of 
heads obtained so far; and the fourth row shows the proportion of heads so 
far, that is, the total number of heads so far divided by the number of 
tosses so far. The final row gives these proportions as decimals. 


Table 1.1 


Toss number 

Outcome (jh or ft) 
Number of heads so far 
Proportion of heads (P) 


P as a decimal 
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Figure 1.4 shows a plot of P, the proportion of heads so far, on the vertical 
axis, against the toss number on the horizontal axis. Successive points 
have been joined with straight lines to show more clearly how the 
proportion of heads changes as the number of tosses increases. 
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Figure 1.4 Proportion P for 30 tosses of a coin 
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This work was not published 
until 1713, eight years after 


Bernoulli’s death. 


P(E) is usually read as ‘the 
probability of E’ or simply as 
‘P of E’. 


Take a coin and toss it 30 times, keeping a record of the result of each toss 
in a table similar to Table 1.1. (Your results will almost certainly be 
different from those shown in Table 1.1 and Figure 1.4.) 


Now plot your results on a graph similar to that in Figure 1.4. 


What do you notice about the way the proportion of heads changes as the 
number of tosses increases? 


Comment 


An interesting phenomenon is apparent from Figure 1.4, which you may 
also have observed for your results. For small numbers of tosses, there are 
quite large fluctuations in the proportion of heads observed. However, as 
the number of tosses increases, the differences between successive values of 
P tend to become smaller: the proportion of heads observed seems to be 
settling down to some constant value. The value towards which the 
proportion of heads is tending is called the probability of obtaining a 
head when the coin is tossed. Of course, it is possible that, in a sequence 
of only 30 tosses, the proportion of heads differs substantially from + you 
may have obtained a value either greater than or less than 5: To be 
confident that the proportion of heads really does approach the value >, a 
much longer sequence of tosses is required. However, tossing a coin a large 
number of times would be a lengthy and tedious exercise: in the computer 
section, you will be able to use the computer to simulate tossing a coin a 
large number of times to see what might happen if you actually carried out 
the tossing. 


The idea of assigning a number to an event which expresses how likely that 
event is to occur is fundamental to probability theory. In general, suppose 
that an event EF (say) may or may not occur in an experiment, and that 
the experiment can be repeated as often as we like. For instance, the event 
E might be obtaining a head when a coin is tossed, or a six when a die is 
rolled. If the experiment is repeated many times, then the observed 
proportion of occasions on which the event E occurs will tend to settle 
down to some constant value as the number of times the experiment is 
repeated increases. This value is called the probability of the event E 
and is denoted P(E). 


As just stated, P(E), the probability of an event E occurring in a single 
experiment which can be repeated many times, is defined to be the 
long-run proportion of occasions on which EF occurs. 


(a) What can you deduce about the range of values which are possible for 
probabilities? 

(b) What is the probability of an impossible event? For example, what is 
the probability of obtaining a 7 when a single six-sided die is rolled? 


(c) What is the probability of an event which is certain to occur? For 
example, what is the probability of obtaining a score between 1 and 6 
inclusive when a single six-sided die is rolled? 


Comment 


(a) Since a probability is a proportion (the proportion of occasions on 
which an event occurs), it is always a number between 0 and 1. 


(b) An impossible event never happens (you never score 7 with a six-sided 
die), so its probability — the proportion of occasions on which it occurs 
sas 0. 

(c) Similarly, an event which is certain to happen always occurs (you 
always get a score between 1 and 6 when you roll a six-sided die), so 
its probability is 1. 


We can summarize these results as follows. 


1. For any event E, 
a7 2) < 1. 


2. If an event F never happens, then P(E) = 0. 


3. Ifan event EF is certain to happen, then P(E) = 1. 


The first property is a very useful one to remember. You can use it as a 
‘commonsense’ check in probability calculations: if a calculation results in 
a ‘probability’ outside the range 0 to 1, then you know you have made a 
mistake. 


We have now defined what is meant by the probability of an event. But 
how can we calculate probabilities in practice? In general, it is not feasible | - 
to carry out repeated experiments to estimate probabilities; and sometimes 
it is impossible — for instance, how could you calculate the probability that 
you will be involved in a motor accident within the next year? However, 
for coin-tossing, we believed that the two possible outcomes, heads and 
tails, were equally likely (and nothing else was possible), so we predicted 
that the proportion of tosses resulting in heads would be approximately -. 
Because of the symmetry of a coin, we were able to say, without carrying 
out a long sequence of tosses, that the probability of a head is >. Using the 
notation just introduced, this is written as P(h) = 5. 


The idea of equally-likely outcomes can be used to calculate probabilities 
in many other situations. It is fundamental to many of the examples 
discussed in later sections of this chapter. Problems involving dice, for 
instance, can be tackled by assuming that, when a die is rolled, each of the 
six faces is equally likely to be uppermost when it lands. In a large number 
of rolls of a single die, we would expect each face to be uppermost for 
approximately ; of the rolls, that is, approximately 5 of the time: so 


Pees = = P44) = PS) = PH) = - 
And, since 3 out of the 6 equally-likely outcomes are even numbers, we 


would expect an even number 2 of the time, so 


P(even number) = 2 = 5. 


The next two activities relate to situations where it may be assumed that 
all the possible outcomes are equally likely. 


For MST121, you are not 
expected to know how to 
calculate the number of 

different selections; this is 


calculated in MS221 Block B. 


A standard pack of 52 playing cards consists of four suits — hearts, clubs, 
diamonds and spades. Each suit contains thirteen cards — an ace, cards 
numbered 2 to 10, a jack, a queen and a king. A pack of 52 cards_is 
shuffled thoroughly and the top card is turned face up. Write down the 
probability that this card is 


(a) the ace of spades, 
(b) an ace, 

(c) a heart. 
Comment 


A solution is given on page 59. 


(a) In the 1970s, the state of New Jersey in the USA had a lottery with a 
single 50000 dollar prize. One million tickets were sold: the tickets 
were numbered from 000000 to 999999. The winning ticket was 
identified by choosing a six-digit number at random, that is, in such a 
way that each six-digit number had an equal chance of being selected. 


What is the probability that a person will win such a lottery if he or 
she buys (i) one ticket, (ii) ten tickets? 

(b) In the British National Lottery, which was introduced in 1994, a player 
chooses six different numbers between 1 and 49 (inclusive). Each week 
six numbers are drawn at random, and a player wins a share of the 
jackpot if all his or her six numbers match the six numbers drawn. 
There are 13983 816 different selections of six numbers between 1 and 
49 and, since the six numbers are drawn at random, the selections are 
all equally likely to occur. 


Find the probability that a player will win a share in the jackpot if he 
or she makes (i) one selection, (ii) ten different selections, (iii) 100 
different selections. 


Comment 


A solution is given on page 59. 


Unfortunately, it is not always possible to calculate probabilities simply by 
considering equally-likely outcomes. For instance, you could not use this 
method to find the probability that when a drawing pin is dropped it will 
land point up; or to find the probability that it will snow this year in 
London on Christmas Day; or to find the probability that a person taking 
out health insurance will make a claim in the next year. In such cases, we 
must return to the definition of a probability as the long-run proportion of 
the time that an event occurs. In the case of the drawing pin, we could 
toss it a large number of times and hence estimate the probability that in 
a single toss it will land point up. 


Although we cannot carry out a large sequence of experiments to estimate 
the other two probabilities, we can make use of existing data. For instance, 
we could find out how many times snow has fallen in London on Christmas 
Day in the last hundred years, and use the proportion of years on which 
snow has fallen as an estimate of the probability that it will snow in 
London on Christmas Day this year. Similarly, an insurance company 
could estimate the chances of a potential customer making a claim using 
models based on information about, amongst other things, the claims 
records of similar policy holders; a decision could then be made about 
whether to issue a policy and what premium to charge. 


The idea of estimating probabilities from data has been accepted for 
several hundred years. In the 17th century, the Englishman John Graunt 


used data from the weekly Bills of Mortality to calculate empirical The Bills of Mortality are 
probabilities of various life events — for example, of dying from a particular described briefly in the essay 
disease, or in an accident, or in childbirth. on demography which 


= accompanies Chapter B3. 
He also estimated the population of London at risk in a number of plague 


years, and compared the severity of the different epidemics by estimating 
the proportion of the population who died of the plague in each year. 

He concluded that 1603 was the worst plague year (about half of the 
population at risk died), and that this outbreak was much more severe 
than the ‘Great Plague of London’ of 1665 (in which about a quarter of 
the population at risk died). 


Modern statistical theory and practice has its roots in the two approaches 
to probability which have been discussed briefly in this subsection — the 
theoretical approach based on equally-likely outcomes, and the empirical 
approach based on the collection of data. The empirical approach was 
developed in England during the same period that European 
mathematicians, as a result of efforts to analyse games of chance, were 
developing a theory of probability. 


Empirical and theoretical probability 


In this subsection, you have been introduced to two approaches to 
probability: empirical and theoretical. A dictionary definition of 
‘empirical’ might be something like the following: 


... (of knowledge) based on observation or experiment, not on 
theory. 


So an empirical approach might involve taking observations from the 
past — whether or not it snowed in London on Christmas Day, say — 
and using the proportion of days that it snowed as an estimate of the 
probability that it will snow on Christmas Day this year. Or it might 
involve repeating an experiment many times — say, tossing a coin or 
rolling a die 100 times — and noting the outcomes. 


THAT'S 100 RoLcS. 
EACH ONE COMES OP ROUGHLY 


ONE SIXTH OF THE TIME . 


Using an empirical approach, if you 
rolled a die a large number of times and 
got a six roughly < of the time, then 
you would conclude that the probability 
of getting a six is approximately <. 


EMPIRICAL APPROACH 


HMM.., SIX FACES -~SYMMETRICAL 
SHAPE -THEY MUST ALL BE EQUALLY 
LIKELY, SO EACH MUST HAVE 
A PROBABILITY OF V6 


A theoretical approach is based on the 
geometric symmetry of the coin or die. 
A coin has two faces, so, assuming that 
each face is equally likely to occur, each 
of the two outcomes (head and tail) has 
a probability of 5. Using a similar 
argument, a die has six ‘identical’ faces, 
so, assuming that the six outcomes 

(1 to 6) are equally likely to occur, each 


outcome has a probability of =. 


Theoretical probabilities are calculated by making assumptions 
(such as those above based on the symmetry of a coin or die), while 
empirical probabilities are based on experimental evidence or on 
records of what has happened in the past. 


In this subsection, the basic notation for the probability that an event 
occurs has been introduced: P(E) is the probability that an event E 
occurs, and is sometimes said simply as ‘P of E’. There have already been 
quite a number of statements in the text which have used this notation, 
including the following: 


P(E)=0, P(E) =1, 0< P(E) <1, 
P(h) ==, P(3) =%, ~P(even number) = 2. 


2) 6 
What words do you say to yourself as you read these statements? For 
example, how do you read the first statement? While the notation is new 
to you, you will probably find it helpful to use words that convey the full 
meaning of the statements; for example, you might read the first statement 
as ‘the probability that the event EF occurs is zero’. However, later on, 
when you are more familiar with the language and ideas of probability, you 
may find that it is enough for you to say ‘P of E is zero’. 


Activity 1.6 Explain 


Look back at Activity 1.5, and say out loud to yourself each of the 
statements. 


On Learning File Sheet 1, note down points to remind yourself about the 
notation or how to read it. 


What words or phrases would you use if you were saying these statements, 
that is, explaining them, to a science student who had not just read this 
subsection? For any two of them, note down your response. 


What question could you ask your fellow student in order to check their 
understanding? What response would you expect? 


Comment 


It is one thing saying words and phrases to yourself, but often quite a 
different thing to say them to other people — especially if the person or 
persons are not expert in that particular field. However, presenting your 
own ideas and views and giving information to non-experts is often an 
important part of a project — you may, for instance, be working with a 
group of student scientists or psychologists who are setting up different 
experiments and trials. Helping them to understand the ideas related to 
probability may be crucial to the success of the project. But thinking how 
you could help others to understand ideas can also be useful for you, 
because it forces you to be clear about your own knowledge and 
understanding. 


Look at the notes you have made for the two statements. Are you 
convinced that a non-expert could really make sense of them? Are there 
any ambiguities? Are all the words clear? Have you used any technical 
words or symbols? Have you explained them? Obviously, it is important to 
use terminology precisely, but if you are speaking to non-experts you also 
need to be able to explain it clearly. 


Check the question you have designed. Does it really assess understanding? 


SO THIS LAST Toss 1S BOUND 
TO BE HEADS, TO MAKE IT 50:50 — 
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I THINK YOU'VE MISUNDERSTOOD 
THE PROBLEM ABOUT “TOSSING HEADS’ 
MONSIEUR D'ALEMBERT ! 
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1.3 The questions 


This subsection consists of a selection of problems and puzzles. You are 
not expected to be able to solve the problems at this stage, but wherever 
possible use your experience and intuition to ‘guess’ an answer. Write 
down your ‘guesses’, so that later you can compare them with the answers 
to the problems. You will be able to explore some of the problems in the 
computer section which follows. The tools for tackling the problems are 
developed in the remaining sections of this chapter, and each problem will 
be revisited as the necessary ideas and techniques are introduced. You 
may find the answers to some of the questions surprising, so do not worry 
too much about whether your responses are correct. Finding out which of 
your intuitions are in need of possible revision, and why, will help you to 
gain a better understanding of the nature of chance events. 


The Brains Trust 


Some years ago, BBC Radio broadcast a regular programme called 
the Brains Trust. Each week, a selection of questions which had been 
sent in by listeners were put to a panel of ‘brains’. In one programme 
during the Second World War, the panel was asked: ‘What is the law 
of averages?’ One member of the panel, Dr C. E. M. Joad, replied: 
‘The law of averages says that if you spin a coin a hundred times, it 
will come down heads fifty times, and tails fifty times.’ 


Do you think this is correct? If in doubt, try tossing a coin a hundred 
times to see what happens! And if you do get fifty heads and fifty 
tails, try repeating the experiment! What do you understand by the 
phrase ‘the law of averages’? How would you use coin-tossing to 
explain ‘the law of averages’? 


D’Alembert’s heads 


The Frenchman Jean d’Alembert was one of the great 
mathematicians of the 18th century. In 1754, the following problem 
was proposed to him: in two tosses of a coin, what is the probability 
that the coin will land heads at least once? He argued that there are 
three cases: heads on the first toss, heads on the second toss, and 
heads on neither toss. Two of these three give at least one head; 

2 


therefore, he argued, the probability required is $. 


What do you think? Do you agree with d’Alembert’s argument? 


The three-card game 


An entertainer at a fairground invites members of the public to bet 
50p on the outcome of a three-card game. Three cards are put into a 
hat: one is white on both sides, one is red on both sides, and the third 
is white on one side and red on the other. If you decide to play the 
game, then he lets you choose a card from the hat without looking at 
it, and place it flat on a table. If the side showing is red, then he says: 
‘This isn’t the white-white card, so it must be one of the other two. 
I'll bet you 50p that the other side is red.’ (So, in this case, he will 
pay you 50p if the other side is white.) And similarly, if the side 
showing is white, he offers to bet you 50p that the other side is white. 


| STILL PREFER THE THREE 
CARDIGAN TRICK! 


Would you accept the wager? Do you think it is a fair bet? 


Galileo and the three-dice problem 


Galileo Galilei was born in Pisa in 1564. He was educated in a Jesuit 
monastery until he was sixteen, and then spent a short time in 
commerce. After this, he studied medicine at the University of Pisa 
and, at the age of 25, became a professor of mathematics there. He 
later moved to Padua and, in 1613, from there to Firenze (Florence). 
At this time of his life, Galileo was almost entirely occupied with 
astronomy, but at some time between 1613 and 1623, he seems to 
have been instructed to look at a problem concerning the total score 
obtained when three dice are rolled. In his own words, he was 
‘ordered to produce’ whatever occurred to him about the problem. 
(Presumably he was asked to look at the problem by his 
employer/patron, the Grand Duke of Tuscany.) 


GALILEO WAS ALSO KNOWN AS 
THE ‘LEANING TOWER OF 9122A" 


The problem was essentially as follows. There are six 3-partitions of 
9: that is, there are six different sets of three die scores which add up 
to 9, namely (621), (531), (522), (441), (432), (333). There are also 
six 3-partitions of 10: (631), (622), (541), (532), (442), (433). Galileo 
was asked to investigate why, even though there are the same number 
of 3-partitions of 9 as there are of 10, 10 seems to be ‘more 
advantageous’ in practice. (Presumably the Grand Duke regarded a 
total of 10 as more advantageous because he had observed that it 
occurred more often than a total of 9.) 


What do you think about this problem? Is a total of 10 more likely 

than a total of 9 when three dice are rolled? Write down your ideas. 
We shall return to this problem in Section 3, where you will see how 
Galileo tackled it. 


} THINK You NEED A LITTLE 
MORE TOPSPIN ON THE BACK-HAND, 
MONSIEUR DE MERE! 
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The Chevalier de Méré: sixes and double-sixes 


In his correspondence with Pierre de Fermat, Blaise Pascal raised a 
problem which had been brought to him by the Chevalier de Méré. 
According to Pascal, the Chevalier claimed to have ‘found falsehood 
in the theory of numbers’. The Chevalier made this claim because 
two wagers, which he had reasoned to be equally advantageous, had 
proved not to be so in practice. He had correctly calculated the odds 
of rolling at least one six with four rolls of a single die, and found that 
they are favourable (that is, the probability of doing so is greater 
than 5). According to Pascal, the Chevalier reasoned that, since 

‘24 is to 36 (which is the number of pairings of the faces of two dice) 
as 4 is to 6 (which is the number of faces of one die)’, it should also 
be advantageous to bet on rolling a double-six in twenty-four rolls of 
two dice. 


Which do you think is the more likely: rolling at least one six with 
four rolls of a single die, or rolling at least one double-six with 
twenty-four rolls of a pair of dice? Or do you agree with the Chevalier 
that they should be equally-likely events? Could it be that he simply 
had a run of bad luck at the gambling tables? Or was he right to 
suspect that the second event was less likely than the first? 


Balanced families 


The Watsons regard one boy and one girl as the ideal family. When 
they married, they reasoned that since boys and girls are equally 
likely, they had an even chance of getting one boy and one girl in 
their planned family of two. 


For the Johnsons, two boys and two girls is the ideal family. They 
also reckoned that, because boys and girls are equally likely, their 
chances of achieving their ideal family were fifty-fifty. 


Do you think the Watsons and Johnsons are right about their 
chances? 


Waiting for a girl 


Some couples with children feel that their family is not complete until 
they have at least one boy and one girl. Some long for a boy. Others 
long for a girl. 


Suppose that a couple who want a daughter decide to continue having 
children until a girl is born. They could be ‘lucky’ with their first 
child turning out to be a girl, or they may have a long line of boys 
before eventually having a daughter. A number of questions arise, the 
answers to which might well be of interest to the parents. For 
example, if boys and girls are equally likely, how many children 
should they expect to have before their family is complete? That is, 


what is the average size of families who continue having children until 


a girl is born? What is the most likely size for their family? What is 
the probability that they will have more than four children? 


Waiting for a six 


In some board games, players can join in the game only when they 
obtain a six on the roll of a die. Several questions spring to mind 
here. First, on average, how many times will a player have to roll the 
die in order to start? Secondly, what is the most likely number of 
rolls needed? And what is the chance that a player will still be 
waiting to join in after 10 rolls, or after 20 rolls? 


Write down your ideas about these questions. What does your 
intuition suggest to you? 


Collecting a complete set of musicians 


Some time ago, a certain cereal manufacturer offered eight different 
toy musicians as gifts in packets of a particular popular breakfast 
cereal. Each packet contained one musician only, but there was no 
way of knowing which it contained without opening the packet. How 
many packets might you expect to have to buy in order to acquire a 
complete set of musicians? That is, what is the average number of 
packets that a family might have to buy to acquire a complete set? 
Write down your ‘intuitive’ answer to this question. 


Coinciding birthdays 


Suppose that each of the 24 children (no twins) in a (small) class 
decides to give a party on their birthday. What do you think the 
chances are that at least two of the children will need to hold a joint 
party as their birthdays are on the same day of the year? 


Summary of Section 1 


In this section, the idea of a probability has been introduced and you have 
been invited to propose answers to a selection of problems involving chance 
events. In the next section, you will be invited to explore some of these 
problems using your computer and to make some further hypotheses using 
your results. You have also made a start at thinking how you could explain 
ideas to non-experts. 


THIS 1s SO UNLIKE ME - 
| USUALLY GET A S\X FIRST GO! 
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ANASTASIA LOVE - WE'RE 
CHOCK- A- BLOCK WITH BRASS 
PLAYERS ALREADY! 


WOW! OUR BIRTHDAYS ARE EXACTLY 
33 DAYS APART i WHAT ARE THE 
CHANCES OF THAT HAPPENING ? 


To study this section, you will need access to your computer, together with 
the statistics software and Computer Book D. 


In Section 1, the probability of an event was defined to be the proportion 
of the number of times that an event occurs ‘in the long run’. So, in 
theory, we could estimate the probability of an event by carrying out a 
long sequence of identical experiments and observing the results. In 
Activity 1.1, you tossed a coin 30 times and noted the proportion of times 
that it landed heads up. It was observed that the proportion of heads 
fluctuated as the number of tosses increased, but that it seemed to be 
settling down to some constant value. It looked as though this value might 
be >, but with only 30 tosses, we could not be absolutely sure of this: we 
really need to carry out a much longer sequence of tosses. No doubt you 
found that tossing a coin and recording the outcome soon became tedious, 
even for as few as 30 tosses, so you certainly would not want to toss a coin 
300 times! Fortunately, the computer can help you with this sort of task. 
Of course, the computer does not actually toss a coin, but instead it 
generates outcomes according to a random procedure. In the case of 
tossing a coin, it generates sequences of ‘heads’ and ‘tails’ which are 
indistinguishable from the sorts of results you might get if you actually 
tossed a coin. This sort of alternative to carrying out a real experiment is 
known as simulation. 


In this section, you will be using computer simulations, first to investigate 
further the ‘settling down’ phenomenon noticed with coin-tossing, and 
then to explore some of the problems described in Subsection 1.3. 


Refer to Computer Book D for the work in this section. 


Summary of Section 2 


The main purpose of this section has been to encourage you to develop 
your understanding of the nature of randomness and to compare some of 
your intuitions from Subsection 1.3 with what happens in practice. 

You have had the opportunity to question your intuitions and to start to 
develop hypotheses about some results. And you have gained experience in 
using a range of simulations to model situations involving chance. 


In Subsection 1.2, the probability of an event was defined as the long-run 
proportion of the number of times that the event occurs. It was noted that 
in some situations involving chance, the probability of a particular 
outcome can be calculated without recourse to carrying out a sequence of 
trials or to collecting masses of data. This is the case whenever it is clear 
that the different possible outcomes are all equally likely to occur. For 
example, when tossing a coin, the coin seems as likely to land heads up as 
tails up; and when a die is rolled, each of the six faces seems equally likely 
to come up. 


Several of the problems from Subsection 1.3 can be tackled using the idea 
of equally-likely outcomes — for example, The three-card game, 
D’Alembert’s heads, Balanced families, Galileo and the three-dice problem 
and The Chevalier de Méré: sizes and double-sixes. The aim of this 
section is to introduce some basic rules for calculating probabilities and to 
use them to tackle each of these problems in turn. 


As you work through this section, continue the ‘imaginary dialogue’ you 
started in Section 1. Imagine you are working with a team of students who 
are not mathematicians, and you are discussing ideas with them and 
guiding them in their thinking and understanding about probability. 
Think about the language you would use, how you would explain the ideas, 
and how you might check that they understood what you were saying. 


Before beginning work on these problems, consider briefly the way in 
which the two words outcome and event have been used in this chapter. 
These words have not been used interchangeably: care has been taken over 
when each is used. We have spoken of ‘the possible outcomes of an 
experiment’ and of various ‘events associated with an experiment’. 

The best way for you to sort out the distinction between ‘outcomes’ and 
‘events’ is to consider some examples. 


(a) Suppose that an experiment involves rolling a die with faces numbered 
from 1 to 6 and noting the score on the uppermost face. First write 
down all the possible outcomes of the experiment: there are six of 
them. Then write down at least three events associated with the 
experiment (there are many possibilities for these). If you are not sure 
of the difference between ‘outcomes’ and ‘events’, then look back at 
some of the examples and activities in Subsection 1.2 to help you make 
your lists. 


(b) Another experiment involves drawing a card from a well-shuffled pack 
of 52 playing cards and noting which card it is. There are 52 possible 
outcomes of the experiment. What are they? Write down three events 
associated with the experiment. 


The distinction between outcomes and events will be useful in this section 
when developing some basic rules for calculating probabilities. Try to 
express in your own words what you understand to be the difference 
between ‘outcomes’ and ‘events’. Explain to a non-mathematics student 
how to use the words precisely and what the difference is between 
outcomes and events. Use Learning File Sheet 2 if you wish. 
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Consider first experiment (a), in which a die is rolled and the number on 
the uppermost face is noted. As a result of carrying out this experiment, a 
number between 1 and 6 (inclusive) is noted. The number noted is the 
outcome of the experiment; it describes precisely what occurs when the 
specific experiment is run. In this case, there are six possible outcomes of 
the experiment: the numbers 1 to 6. An event can be any happening 
associated with the experiment though it does, of course, include all the 
outcomes as possible events. Some examples of events associated with this 
experiment are ‘obtaining an even number’, ‘obtaining a number greater 
than 4’, ‘obtaining a 6’ and ‘obtaining a multiple of 3’. 


Similarly, in experiment (b), when a card is drawn from a pack of 52 
playing cards and its suit and number are noted, there are 52 possible 
outcomes: ace of hearts, two of hearts, three of hearts, ..., king of spades. 
Examples of events associated with this experiment, include ‘obtaining a 
heart’, ‘obtaining an ace’, ‘obtaining a black card’ and ‘obtaining a red 
queen’. 


In general, an outcome is the precise result of an experiment (such as 
getting a six when a die is rolled), whereas an event is any happening 
associated with the experiment: it may be one of the possible outcomes of 
the experiment, such as ‘obtaining a six’, or it may be a more general 
happening, such as ‘getting an even number’. One part of probability 
theory involves developing ways of calculating the probability of any event 
given only the probability of the possible outcomes. 


When you are explaining technical terms to someone else, it may be useful, 
for instance, to use examples. Try to think what is helpful to you, and use 
a similar technique. In a discussion with other students, how might you 
create opportunities for others to try out their understanding? 


3.1 Counting problems 


The idea of counting equally-likely outcomes was used in Subsection 1.2 to 
find the probability of each of a number of events. For example, there are 
two outcomes when a coin is tossed: heads and tails. Assuming that these 
are equally likely to occur, each outcome will occur half the time in the 
long run, so 


P(head) = P(tail) = . 


There are six outcomes when a die is rolled — the faces are numbered 

1 to 6. It has been known for some gamblers to cheat by ‘loading’ their 
dice so that, when rolled, some faces are more likely to come up than 
others. However, assuming that a die is not loaded, the six possible 
outcomes, 1 to 6, are equally likely, so 


P(1) = P(2) = P(3) = P(4) = P(5) = P(6) =2. 
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Similarly, if a card is drawn from a pack of 52 playing cards, there are 
52 possible outcomes and these are all equally likely, so, for example, 


i 


P(ace of spades) = 5, P(seven of hearts) = 3. 


52? 


In general, if an experiment (tossing a coin, rolling a die, picking a card, 
etc.) has N possible outcomes and these are all equally likely, then for any 
particular outcome, the probability that it occurs is 1/N; that is, 


1 
P(particular outcome) = Nr" 


To find the probability that an even score is obtained when a die is rolled, 
we also counted the number of outcomes that give an even number, and 
hence found the proportion of outcomes which give an even score. Three of 
the six possible scores are even (2, 4 and 6), so 


P(even score) = 3 = $. 


Similarly, to find the probability of drawing an ace from a pack of 
52 cards, we counted the number of aces (4) and hence found the 
proportion of outcomes that give an ace: 


Poe) = = =. 


These are examples of the following general result. 


If an experiment has N equally-likely possible outcomes, and n(£) is 
the number of these outcomes that result in an event / occurring, 
then 


n(E) 


P(E) = (3.1) 


N y] 
that is, P(E) is equal to the number of outcomes for which the event 
E occurs divided by the total number of possible outcomes. 


This result can be used to answer several of the questions which were 
posed in Section 1. The three-card game, D’Alembert’s heads, Balanced 
families and Galileo and the three-dice problem can all be tackled by 
counting equally-likely outcomes and using formula (3.1). We shall begin 
by looking at the problem of the three-card game, which is the simplest of 
these to solve. 


The three-card game 


In this game, three cards are put into a hat: one of the cards is white on 
both sides, one is red on both sides, and the third is white on one side and 
red on the other. One of the three cards is drawn at random from the hat 
and placed flat on a table. If the upper side of the card on the table is red, 
the fairground entertainer offers to bet you 50p that the other side is red, 
since, as he says, ‘This isn’t the white-white card, so it must be one of the 
other two’. Would you accept the wager? 


At first sight, the bet might seem a fair one: there are two possibilities: 
either the card is the red-red one or it is the red-white one. However, the 
fairground entertainer is no fool — his trick is a good steady earner for him. 
To see why this is so, we need to calculate the probability that the other 
side of the card is red. And to do this, we must first identify the 
equally-likely outcomes involved in the situation. The side showing could 
be any one of the three red sides and, since the card is selected at random 
and placed on the table, each of these three sides is equally likely to be the 
one showing. In one case the other side is white, and in the other two cases 
the other side is red, so 


P(other side is red) = 2. 


3 


The crucial point here is that the equally-likely outcomes are the faces not 
the cards. 


If you find this difficult to understand, then imagine that the sides of the 
cards are numbered: R1 and R2 on the red-red card, W1 and W2 on the 
white-white card, and R3 and W3 on the red-white card. Suppose that a 
card is selected at random and placed on the table, and the side showing is 
red. Since the card is selected at random, no side is more likely than any 
other side to come up — R1, R2 and R3 are equally likely. If the side 
showing is R1, then the other side is R2; if it is R2, then the other side is 
R1; and if it is R3, then the other side is W3 (see Figure 3.1). 
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Figure 3.1 The other sides 


In two of these three cases, the other side is red and hence the probability 
that the other side is red is :. So, in the long run, the entertainer will win 
approximately 5 of his wagers, and hence make a good profit. 


Many people find this result difficult to believe even after following the 
above argument. If you are not convinced, then try an experiment. Take 
three pieces of card and label the sides ‘red’ or ‘white’ so that one card is 
labelled red on both sides, one is labelled white on both sides and the third 
is labelled red on one side and white on the other. Then carry out a 
sequence of trials. Place the three pieces of card in a bag or a hat, remove 
a card at random and place it flat on a table without looking at it first. If 
the side showing is labelled red, then note down the label on the other side 
—red or white. (If the side showing is labelled white, return the card to the 
bag and start again.) Repeat this to obtain a sequence of results. Estimate 
for yourself the proportion of the time that the other side is labelled red. 


When you have convinced yourself, how would you go about convincing 
someone else? Would you adopt the same strategy as given here? 


If the side of the card showing on the table is white, the fairground 
entertainer offers to bet you 50p that the other side is white. Is this a fair 
bet? 


Comment 


A solution is given on page 59. 


D’Alembert’s heads 


D’Alembert argued that when a coin is tossed twice, the probability that 
at least one head is obtained is . One set of results obtained from running 
the simulation in Activity 2.4 (in Computer Book D) is shown in 


Figure 3.2. 
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Number of heads 


Figure 3.2 The results of a simulation 


In the simulation, each trial consisted of tossing a coin twice and noting» 
the number of heads obtained (0, 1 or 2); 100 trials were carried out, and a 
frequency diagram displayed for the number of heads obtained in the trials. 
Figure 3.2 shows that in this simulation, at least one head was obtained in 
73 (= 53+ 20) out of the 100 trials. The proportion of trials in which at 
least one head was obtained is 0.73 — somewhat greater than S. Did you 
obtain similar results? In each of your simulations, was the proportion of 


trials in which at least one head was obtained greater than et 


In four further simulations that we carried out, the proportions obtained 
were 0.68, 0.78, 0.71, 0.75. If these results are typical, then they suggest 
that the probability of obtaining at least one head in two tosses of a coin is 


greater than = and d’Alembert was wrong! 


As with the three-card game, the key step in investigating the problem of 
D’Alembert’s heads is to identify the equally-likely outcomes involved in 
the situation. You are asked to do this in the next activity. 


Activity 3.3 Two tossesofacoin 


List all the possible outcomes of tossing a coin twice, using h to represent 
a head and ¢ for a tail. How many outcomes have you listed, and are they 
all equally likely? What do you make the probability of obtaining at least 
one head in two tosses of a coin? 


Comment 


It is important in this activity to distinguish between the results of the 
first and second tosses. Writing the results of the tosses in the order in 
which they occur, we obtain four possible outcomes: 


hhs he, the 200, 


where, for example, ht means the first toss results in a head and the 
second in a tail. 


Since a head and a tail are equally likely to occur on each toss, these four 
outcomes are equally likely. In three of these four outcomes — hh, ht and 
th — at least one head occurs so, using formula (3.1), the probability of 


obtaining at least one head in two tosses of a coin is 2. As we suspected, 
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d’Alembert was wrong! 


Look back at d’Alembert’s argument. Can you see where he went wrong? 
He identified three events, but these were not equally likely; he did not 
identify the equally-likely outcomes of tossing a coin twice. 


(a) List all the possible outcomes of tossing a coin three times. 


(b) Write down the probability that in three tosses of a coin: 
(i) no heads are obtained; 
(ii) at least one head is obtained; 
(iii) at least two heads are obtained; 


(iv) exactly two heads are obtained. 


Comment 
A solution is given on page 59. 


Note that it is important to be systematic when listing the possible 
outcomes of an experiment, so as to avoid missing any out. 


Did you notice that the two probabilities calculated in parts (b)(i) and 
(b) (ii) of this activity summed to 1? Since either there are no heads in 
three tosses or there is at least one head, one or other of these two events 
is certain to occur. Hence the probability that one or other of the events 
occurs is equal to 1. Since the two events cannot occur simultaneously, 
their separate probabilities must add up to 1. This is an example of the 
following useful rule for probabilities. 


If & is an event and not-F is the opposite event (that E does not 
occur), then 


P(E) + P(not-E) = 1, 
or, equivalently, 


P(E) = 1— P(not-E). 


This rule is a particularly useful one in problems where it is easier to 
calculate the probability that a particular event does not occur than it is 
to calculate directly the probability that it does. For example, the rule 
could be used to calculate the probability of at least one head in three 
tosses of a coin without counting all the possible ways in which at least one 
head can occur. Using (3.2), 


P(at least one head) = 1 — P(no heads) 
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=? 
= 1, 


which is the answer you obtained in Activity 3.4 by counting outcomes. 


Balanced families 


In the 18th century, it was observed that patterns in the sequences formed 
by the sexes (male and female) of successive births in city hospitals were 
not unlike the patterns of heads and tails resulting from successive tosses 
of a coin. This suggests a possible simple model for births: the probability 
that the next child born is a girl is 55 and the probability that it is a boy is 
- and these probabilities are the same whatever the sex of children born 
previously. 


If we accept this model, then questions about family patterns can be 
tackled by methods which have been developed to answer problems about 
coin-tossing. For example, if we make the analogy between a family of two 
girls and getting two heads in two tosses of a coin, then the proportion of 
families of size two consisting of two girls should be approximately equal to 
the probability of getting two heads in two tosses of a coin. This model 
can be used to tackle questions like Balanced families, for instance. 


In fact, as long ago as the middle of the 17th century, John Graunt 
discussed the sex ratio: the records of christenings that he examined 
seemed to suggest that rather more boys than girls were born. And in the 
18th century, Abraham de Moivre remarked on observations made by 
Nicholas Bernoulli (1687-1759) as follows. 


Mr Bernoulli collects from Tables of Observations continued for This quotation is taken from 
82 years, that is from A.D. 1629 to 1711, that the number of Births in The Doctrine of Chances by 
London was, at a medium, about 14000 yearly: and likewise, that the © Abraham de Moivre, which 
number of Males to that of Females ... is nearly as 18 to 17. But he was published in London in 
thinks it the greatest weakness to draw any Argument from this 1756. 

against the Influence of Chance in the production of the two sexes. 

For, says he, ‘Let 14000 Dice, each having 35 faces, 18 white and 

17 black, be thrown up, and it is great Odds that the numbers of 

white and black faces shall come as near, or nearer, to each other, as . 

the numbers of Boys and Girls do in the ‘ables.’ 


More recent investigations have confirmed that the proportion of babies 
born that are boys is slightly greater than =: Nevertheless, the simple 
model that has been suggested can be used to obtain approximate results 
for problems such as Balanced families; and, as you will see later in this 
section, it is not difficult to modify the model to take account of the slight 
imbalance between male and female births. But in the next activity, you 
should assume the simple model; that is, you should assume that each 


child born is equally likely to be a girl or a boy. 


ag iced far nies ee : _ | 


(a) (i) List all the possible patterns of families of two children, using G to 
represent a girl and B for a boy. Take care to distinguish between the 
first born and the second born. 

(ii) The Watsons’ ideal family is one girl and one boy. What is the 
probability that they will achieve their ideal family? 

(b) (i) List all the possible patterns of families of four children. How 
many different patterns are there? 

(ii) The Johnsons’ ideal family is two girls and two boys. What is the 
probability that they will achieve their ideal family? 

(c) Are the Watsons and the Johnsons correct to believe that their 
chances of achieving their ideal families are both fifty-fifty? 


(d) How could you explain to them the results of your work? 


Source: Considerazione 
sopra il Giuoco det Dadi 
(Thoughts about Dice 
Games). 


Comment 
A solution to parts (a)—(c) is given on page 59. 


(d) In discussions, it is often necessary to explain quite complex issues to 
different groups of people. Statisticians need to be able to convey their 
ideas clearly to non-specialists. This may involve explaining 
complicated reasoning, sensitive issues or helping others to interpret 
results, and may require careful choice of vocabulary and structuring 
what is said. In any discussion, you need to be able to vary and adapt 
your own contributions to take account of the other people involved, 
the particular item being talked about, and so on. It is useful to think 
about these things as you work through some of the examples. How 
might you deal with complicated ideas or sensitive issues? 


Galileo and the three-dice problem 


When Galileo wrote down his ideas about the problem of why, when three 
dice are rolled, a total of 10 seemed to be ‘more advantageous’ than a total 
of 9, he started by counting the total number of different possible 
outcomes. He began as follows. 


. since a die has six faces, and when thrown it can equally well fall 
on any one of these, only six throws can be made with it, each 
different from all the others. But if together with the first die we 
throw a second, which also has six faces, we can make 36 throws each 
different from all the others, since each face of the first die can be 
combined with each face of the second .... 


For tossing two coins, it is important to distinguish between the first coin 
and the second. In the same way, it is important to distinguish between 
the score on the first die and the score on the second; so, for example, 5 on 
the first die and 4 on the second is a different outcome from 4 on the first 
die and 5 on the second. To remind yourself that these are different. 
outcomes, it is sometimes helpful to imagine the dice being different 
colours; then the fact that they are different outcomes becomes clearer. 
The argument that Galileo used to calculate the number of possible 
outcomes when two dice are rolled can be extended to three dice. 


How many different possible outcomes are there when three dice are 
rolled? (Remember to distinguish between the three dice when counting 
outcomes — for instance, by imagining that they are different colours.) 


Comment 


For each of the 36 outcomes for the first two dice, there are 6 possible 
outcomes for the third die. Hence there are 36 x 6 = 216 different possible 
outcomes when three dice are rolled, and these are all equally likely. 


Having calculated that the total number of possible outcomes is 216, 
Galileo produced a table showing the number of possible outcomes which 
give totals of 3, 4, 5, 6, 7, 8, 9 and 10. (He also observed that the totals for 
11 to 18 were symmetrical with these; for example, the number of ways of 
getting a total of 11 is equal to the number of ways of getting a total of 10, 
and the number of ways of getting a total of 12 is the same as the number 
of ways of getting a total of 9. We shall not be checking all his results! ) 


(a) Galileo’s table indicated that the number of possible outcomes giving a 
total of 9 is 25. However, as already noted, there are only 6 different 
sets of scores of three dice which add up to 9: (621), (531), (522), 
(441), (432), (333). How do you account for this difference? 


(b) List all the possible outcomes that lead to a total score of 9, and hence 
confirm that Galileo’s figure of 25 is correct. What is the probability 
of obtaining a total score of 9 when three dice are rolled? 


(c) Count the possible outcomes which give a total of 10, and hence find 
the probability of obtaining a total score of 10 when three dice are 
rolled. 


Comment 


A solution is given on page 60. 
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In this activity, you-found that the probability of a total score of 9 is 

2 ~ 0.116 and the probability of a total score of 10 is ae =(1 75.4 08 
difference between these probabilities is only ;2 or 75. Perhaps the most 
interesting thing about this result is that the person (the Grand Duke’) 
who asked Galileo to look at the problem had gambled often enough to 
detect the effect of so small a difference in probabilities. This suggests how 
much gambling there must have been among some sections of Italian 


society at that time. 


3.2 Independence and the multiplication rule 


The idea of independent events has been used implicitly in many of the 
examples discussed in Subsection 3.1. For example, it was assumed that 
the score obtained on rolling a die has no influence on the score obtained 
on rolling the same die a second time or on rolling a second die. Whether a 
coin lands heads up when tossed is unaffected by whether it landed heads 
up the last time it was tossed. And when modelling the births of boys and 
girls, it was assumed that whether a baby born is a girl or a boy is 
unaffected by whether babies born earlier were girls or boys. The 
independence of two events can be defined as follows. 


Two events are independent of each other if the occurrence or not of 


one is not influenced by whether or not the other occurs. 


The calculation of probabilities involving independent events is often 
straightforward. 


Suppose, for instance, that we want to find the probability that when two 
coins are tossed, they both land heads up. (You can think of the two coins 
being tossed together or one after the other —.it does not matter which, the 
result below holds in either situation.) Whether or not one coin lands 
heads up is clearly not influenced by whether the other lands heads up: 
the event ‘the first coin lands heads up’ is independent of the event ‘the 
second coin lands heads up’. Each coin has probability ; of landing heads 
up. So, in the long run, the first coin will land heads up half of the time 
and the second coin will land heads up on half of the occasions that the 
first coin lands heads up. Therefore, in the long run, the overall proportion 
of the time that both coins land heads up is 


that is, 
P(both coins heads up) = P(first coin heads up) x P(second coin heads up 


Similarly, if a coin and a die are thrown together, then in the long run the 
coin will land heads up half the time and the die will show a six on ; of 
the occasions that the coin lands heads up. So, in the long run, the 
proportion of the time that a head and a six are obtained together is 


1 
12? 


that is, 
P(head and six) = P(head) x P(six). 


These two examples illustrate the multiplication rule for independent 
events, which can be stated formally as follows. 


Multiplication rule for independent events 
If # and F are independent events, then 


P(E and F) = P(E) x P(F). 


This rule can be used to calculate some of the probabilities that were 
found in Subsection 3.1 simply by counting. For example, when a coin is 
tossed twice, the probability of a tail on the first toss is + and the 


2 
probability of a tail on the second toss is 55 so the probability of tails on 
both tosses, that is, no heads, is 


P(no heads) = 
Hence, using (3.2), 
P(at least one head) = 1 — P(no heads) 


= 1 
=1-1 


1 1 
2 3 4° 


3 
4? 


as obtained by counting outcomes in Activity 3.3. 


The multiplication rule extends to three or more independent events in an 
obvious way: the probability that all the events occur is obtained by 
multiplying together the probabilities of the separate events. You will need 
to use this in the next activity. 


(a) Use the multiplication rule to find the probability that the first three 
children in a family are all girls. 


(b) What is the probability that a family of three will contain at least one | 
boy? 

(c) In the first half of the 20th century, Mr and Mrs Grover C. Jones of 
Peterson, West Virginia, had an all-son family of fifteen sons. 
(i) What is the probability that a family of fifteen children will all be 
boys? 
(ii) What is the probability that a family of fifteen children will all be 
the same sex? . 


Comment 


A solution is given on page 60. 


Nicholas Bernoulli suggested that the births of boys and girls could be 
modelled by the rolling of a die with 35 faces, 18 of which represent a boy 
and 17 of which represent a girl. 


(a) According to this model, what is the probability that a baby born is a 
girl? 

(b) What is the probability that a family of three children will all be girls? 

(c) What is the probability that a family of three children will contain at 
least one boy? 

Comment 


A solution is given on page 60. 


In a similar way, since the results of successive rolls of a die are 
independent and the scores on different dice are independent, the 
multiplication rule can be used to answer questions about successive rolls 
of a die, or about rolls of two, three or four dice. For example, the 
probability of obtaining two sixes in two rolls of a die is found by 
multiplying the probability that the first roll gives a six by the probability 
that the second roll gives a six: 

1 


1 
* 6 36° 


P(two sixes in two rolls) = 4 


Use the multiplication rule to find the probabilities in the next two 
activities. 


In England and some 
European countries, Hazard 
was a game played with two 
dice. 


One of the gambling games with which the 16th-century Italian 
mathematician Girolamo Cardano was very familiar was called Hazard. 
In Italy, this game was played with three dice. In his autobiography, 
De Vita Propria Liber (The Book of My Life), Cardano wrote the 
following about the total score obtained when three dice are rolled. 


To throw in a fair game at Hazards only three spots ... is a 
natural occurrence and deserves to be so deemed; and even 
when they come up the same way for a second time, if the 
throw be repeated. If the third and fourth plays are the same, 
surely there is occasion for suspicion on the part of a prudent 
man. 

(a) What is the probability that when three dice are rolled, the total score 
on the three dice is 3 (that is, in Cardano’s terminology, the total 
number of spots uppermost is three)? 

(b) Find the probability that a total score of 3 is obtained: (i) in each of 
two successive rolls of three dice; (ii) in each of three successive rolls of 
three dice; (iii) in each of four successive rolls of three dice. Do you 
agree with Cardano that you should be suspicious if a score of 3 occurs 
three or four times in a row? 


Comment 


A solution is given on page 60. 


Activity 3.11 Dice problems i — 


(a) Find the probability of obtaining no sixes in two rolls of a die. 
(b) Find the probability of obtaining no sixes in three rolls of a die. 


(c) Find the probability of obtaining a double-six when two dice are rolled 
once. 


(d) Find the probability of obtaining two double-sixes in two rolls of a pair 
of dice. 


Comment 


A solution is given on page 60. 


The Chevalier de Méré: sixes and double-sixes 


The two simple rules (3.2) and (3.3) are sufficient to tackle the problem 
which was posed to Blaise Pascal by the Chevalier de Méré. The Chevalier 
knew that to bet on obtaining at least one six in 4 rolls of a single die was 
advantageous to him in the long run. He wanted to know why it was not 
also advantageous to bet on obtaining at least one double-six in 24 rolls of 
a pair of dice. (Presumably he discovered this the hard way — by 
experience!) 


A useful point to note here is that ‘at least’ problems are often best 
tackled ‘backwards’, that is, using rule (3.2): 

P(E) =1— P(not-E£). 
That is certainly the case for de Méré’s problem. First consider the 
probability of obtaining at least one six in 4 rolls of a single die: 

P(at least one six in 4 rolls) = 1 — P(no sixes in 4 rolls). 
The probability of failing to get a six in a single roll is 2 so, using the 
multiplication rule (3.3), 


P(no sixes in 4 rolls) = ean . 
Hence 
P(at least one six in 4 rolls) = 1 — (6 ~ 0.518. 


This is greater than + thus confirming that betting on obtaining at least 


one six in 4 rolls of a single die is advantageous in the long run. 


In the next activity, you are asked to work out the probability of obtaining 
at least one double-six in 24 rolls of a pair of dice: this is the probability 
that de Méré needed to know. | 


(a) What is the probability of failing to obtain a double-six in a single roll 
of a pair of dice? 


(b) Find the probability of failing to obtain any double-sixes in 24 rolls of 
a pair of dice. 


(c) Hence find the probability of obtaining at least one double-six in 
24 rolls of a pair of dice. Was the Chevalier correct to suspect that 
making the second wager was not a good idea? 


Comment 


A solution is given on page 61. 


Summary of Section 3 


In this section, the concept of equally-likely outcomes has been used to 
introduce some basic rules for calculating probabilities. These rules have 
been used to tackle five of the problems described in Subsection 1.3: 

The three-card game, D’Alembert’s heads, Balanced families, Galtleo and 
the three-dice problem and The Chevalier de Méré: sixes and double-sizes. 
In the next section, the techniques developed in this section will be applied 
to another two of the problems: Waiting for a six and Waiting for a girl. 
This section has also introduced ideas on ways to take part in discussions. 


A tetrahedron is a regular 
four-sided solid, with each 
face an identical equilateral 
triangle. 

Note that for a tetrahedral 
die, the score is that on the 
face on which it lands, as 
opposed to that on the 
uppermost face on a cubical 


die. 


Exercises for Section 3 


Exercise 3.1 Lucky tickets? 


I recently attended a cricket club presentation evening where I was invited 
to draw a ticket out of a hat in exchange for 50p. To win a prize, the 
number on the ticket had to end in 0 or 5. Suppose that there were 

500 tickets in the hat, numbered from 1 to 500, and that I was the first 
person to draw a ticket. 


(a) How many tickets in the hat had numbers ending in 0 or 5? 
(b) What was the probability that I would win a prize? 


Exercise 3.2 Tetrahedral dice 


Two tetrahedral dice each have faces labelled 1, 2, 3 and 4. The dice are 
rolled, and a note is made of the number on the face on which each die 
lands. 


(a) Find the probability that the numbers obtained on the dice add up 
to 9. 


(b) Find the probability that the first die lands on a 2 and the second die 
lands on an odd number. 


Exercise 3.3 More about tetrahedral dice 


(a) A pair of tetrahedral dice are rolled. What is the probability of 
obtaining a double-four? 


(b) Hence find the probability of failing to obtain a double-four in a single 
roll of a pair of tetrahedral dice. 


(c) Find the probability of failing to obtain any double-fours in six rolls of 
a pair of tetrahedral dice. 


(d) Hence find the probability of obtaining at least one double-four in six 
rolls of a pair of tetrahedral dice. 


In this section, the ideas introduced in Section 3 will be used to investigate 
two of the problems described in Subsection 1.3 and explored in the 
computer section: Waiting for a siz and Waiting for a girl. The first of 
these two problems is described again below. 


In working through this section, try to continue to take forward the ideas 
relating to discussion. In what ways would your discussion be the same or 
different when working with different groups of people? For example, 
people you know and who know the subject matter, people you do not 
know, and people who have never met these ideas before. You may like to 
consider the likely outcomes and how you could respond to them. 


Waiting for a six 


Suppose that you are playing a board game in which players can join in 
the game only when they roll a six with a die. Here are some questions 
about the time (measured in terms of the number of rolls) you might have 
to wait to join in: those posed in Subsection 1.3 are included. 


Question 1: What is the probability that you will be able to join in the 
game straightaway? That is, what is the probability that you will roll a six 
at the first attempt? 


Question 2: What is the probability that you will join in the game after 
your second roll of the die? Or after your third roll? Or after five or ten 
rolls? Are you more likely or less likely to join in the game after your fifth 
roll than after your tenth roll? 


Question 3: When are you most likely to join in the game? That is, what 
is the most likely number of times you will need to roll the die to get a six? 


Question 4: How likely is it that you will still be waiting to join in the 
game after five rolls of the die, or after ten rolls, or even after twenty rolls? 


WOULDN'T You KNOW iT- 
FIVE MINUTES i‘vE BEEN WAITING 
{ FOR _A Six, AND THEN THREE 5”. 
COME ALONG TOGETHER! 


Question 5: On average, how many times will you have to roll the die in 
order to obtain a six and so join in the game? 


You were invited to explore several of these questions in the computer 
section, so you should have some ideas about their answers. As we tackle 
the questions in this section, several important ideas from probability 
theory will be introduced. First, in Subsection 4.1, the notions of a 
random variable and a probability distribution are discussed. ‘Then, in 
Subsection 4.2, the idea of the mean of a probability distribution is 
introduced; this is the mean value predicted by the probability model. 


4.1 Isa long wait likely? 


In Activity 2.6 (in Computer Book D), the number of rolls of a die needed 
to obtain a six was simulated: this was repeated 300 times to obtain the 
lengths of 300 waits, and the results were displayed in a frequency 
diagram. Figure 4.1 shows one set of results obtained by a member of the 
course team running the simulation. 


Frequency 
507 
AO 
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20 25 30 35 AO 45 
Length of wait 


Figure 4.1 The results of a simulation 


These results give us some idea of the relative likelihood of different 
numbers of rolls of a die being needed to obtain a six. But are they 
typical? Were your results similar? Use your results from Activity 2.6 for 
simulations with 300 waits, and those in Figure 4.1, to answer the 
questions in the next activity. 


(a) What number of rolls of a die do you think you are most likely to need 
to obtain a six? That is, at what stage are you most likely to join in 
the game? 


(b) Are you more likely or less likely to need ten rolls of a die than you are 
to need five rolls? 


(c) Summarise in your own words the information contained in Figure 4.1. 


Comment 


(a) Figure 4.1 shows that just one roll was needed more often than any 
other number of rolls. That is, it appears that the most likely number 
of rolls of a die needed to obtain a six is 1. However, notice that one 
roll was needed only 52 times out of 300, just over one sixth of the 
time. So it is not all that likely that you will obtain a six with your 
first roll of the die. Nevertheless, although it may not be very likely, it 
does seem to be more likely than any of the other possibilities. 


(b) From the figure, it looks as though ten rolls is less likely than five rolls: 
in the simulation, the first six was obtained on the fifth roll 20 times, 
whereas the first six was obtained on the tenth roll only 9 times. 


(c) In general, it appears that you are more likely to need a small number 
of rolls than you are to need a large number. In fact, a very large 
number of rolls looks very unlikely. Roughly speaking, the proportion 
or ‘empirical probability’ seems to decrease as the number of rolls 
needed increases. 


The simulation was repeated ten times altogether by the course team 
member: in eight of the simulations, one roll occurred most often, and in 
the other two simulations, two was the most frequently occurring number 
of rolls needed. It is possible that for some of your simulations two, three 
or even four rolls occurred most often. However, the general tendency for 
the proportions to decrease with increasing numbers of rolls recurred in all 
our simulations. Did your simulations produce similar results? 
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To gain accurate estimates of the probabilities of the various numbers of 
rolls needed to obtain a six, you would need to carry out a simulation with 
a very large number of waits. Nevertheless, however good the resulting 
estimates, they would only be estimates. There is no guarantee that a 
simulation will come up with a typical set of results; an atypical set of 
results may occur by chance. There is always the possibility that 
conclusions drawn from the results of a simulation will be incorrect. An 
alternative approach to answering the questions asked at the beginning of 
this section is to use the ideas developed in Section 3 to calculate 
corresponding theoretical results. 


It will simplify the arguments and explanations which follow if we 
represent the number of rolls of a die required to obtain a six by the 
capital letter X. Note that X is not a fixed number: it may take different 
values on different occasions — that is a matter of chance. Sometimes X 
may be 1, on other occasions X may be 2 or 3, or any larger whole number 
we care to name. In fact, X is an example of a random variable — a 
quantity which may take different values on different occasions. 


Having defined the random variable X, from now on we can use the letter 
X instead of the lengthy phrase ‘the number of rolls of a die needed to 
obtain a six’. For example, the probability of obtaining a six on the first 
roll of a die can be written as P(X = 1); this is usually read as ‘the 
probability that X is equal to 1’. Similarly, the probability that two rolls 
are needed to obtain a six can be written as P(X = 2), and so on. 


Let X be the number of rolls of a single die needed to obtain a six. 
You will need to use the multiplication rule for independent events — 
result (3.3) — to work out some of the probabilities asked for below. 


(a) Find P(X = 1), the probability that the first roll results in a six. 


(b) Find P(X = 2), the probability that the first roll does not result in a 
six and the second roll does. | 


(c) Find P(X = 3), the probability that neither of the first two rolls 
results in a six and the third roll does. 


(d) Find P(X = 4). 
(e) Suggest a formula for P(X = j), the probability that the first six is 
obtained on the jth roll, where 7 = 1,2,3,.... 


Comment 
(a) The probability that the first roll results in a six is <, so 
PAX =i) es & 
(b) The probability that the first roll does not result in a six and the 


second roll does is, using (3.2) and (3.3), 


the second 
roll is 
a SIX 


the first 
roll is not 
a Six 


(c) The probability that neither of the first two rolls results in a six and 
the third roll does is 


the fourth 
roll is a six 


the first three 
rolls are not sixes 


(e) A pattern is forming here: since the first six appears on the jth roll if 
and only if the first 7 — 1 rolls do not result in a six and the jth roll 
does, using the multiplication rule, we obtain 
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the first 7 — 1 


rolls are not sixes the jth roll 
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That is, 
ie Ee ee ee ee | (4.1) 


The function defined by formula (4.1) is called the probability function 
of X: for each value of 7, it gives you the value of the probability 

P(X = 7). For example, if you require P(X = 3), the probability that 

3 rolls of a die are needed to obtain a six, then putting 7 = 3 in 

formula (4.1) gives 


3-1 2 
P(X = 3) = (8) "x b= (8)? xia 8x dx 


Formula (4.1) describes completely how likely the different possible values 
of X are to occur; that is, it describes how the probabilities (which must 
add up to 1) are distributed among the different possible values, or, using 
the language of probability, it describes completely the probability 
distribution of X. The probabilities are illustrated in Figure 4.2, which 
we shall refer to as a probability diagram, for convenience. Compare this 
probability diagram with the frequency diagram for 1000 simulated waits 
given in Figure 4.3: the shapes are very similar, but the probability 
diagram is smoother than the frequency diagram. 


Probability 
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Figure 4.2 The probability distribution of X 
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Figure 4.3 The results of simulating 1000 waits 


Use the probability function given by formula (4.1) and illustrated in 
Figure 4.2 to answer the following questions. 


(a) How many rolls of a die are you most likely to require to obtain a six? 
That is, when are you most likely to join in the board game? What is 
the probability that you will need this number of rolls? 


(b) Find the probability that you will need: (i) exactly five rolls to obtain 
a six; (ii) exactly ten rolls to obtain a six. Which probability is the 
greater? 


Comment 


A solution is given on page 61. 


Many probability distributions may be used as models for the uncertainty 
inherent in a wide variety of different situations, and so the most common 
distributions have been given names. The distribution given by 
formula (4.1) and illustrated in Figure 4.2 is an example of a geometric 
distribution. It is called a geometric distribution because the 
probabilities P(X = 1), P(X = 2), P(X = 3), ... form a geometric 
sequence: each probability is obtained from the previous one by 
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multiplying it by a fixed number (2 in this case). 


Suppose that we regard obtaining a six as a success, and rolling a die once 
as a trial; then we can think of X as the number of trials required to 
obtain a success. In general, in any sequence of trials of an experiment, 
each of which may result either in ‘success’ or ‘failure’, independent of the 
outcomes of any of the previous trials, the number of trials required to 
obtain a success has a geometric distribution. It is usual to denote the 
probability of success in each trial by the letter p. So, for instance, for 
rolling a die p = 7 since the probability of obtaining a six (a success) is =. 
But what is the formula corresponding to (4.1) for X, the number of trials 
of an experiment required to obtain a success, when the probability of 
success in each trial is p? You are asked to find this formula, that is, to 
find the probability function of X, in the next activity. 


A sequence of trials is carried out: in each trial, the probability of a 
success is p. The random variable X is the number of trials required to 
obtain a success. 


(a) (i) Write down the probability that the first trial is a success, that is, 
the value of P(X = 1). 


(ii) Write down the probability that the first trial is a failure. 


(b) Write down an expression for the probability that the first trial is a 
failure and the second trial is a success, that is, the value of P(X = 2). 


(c) Find an expression for the probability that the first success occurs at 
the third trial, that is, for P(X = 3). 


(d) Suggest a formula for the probability that the first success occurs at 
the jth trial, that is, suggest a formula for P(X = 7). 


Comment 


A solution is given on page 61. 


The results obtained in this activity are summarised below. 


The geometric distribution 


If a sequence of trials of an experiment is carried out and the 
probability of success in each trial is p (0 < p < 1), then X, the 


number of trials required to obtain a success, has a geometric 
distribution. The probability function of X is given by 
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Figure 4.4 shows the probability distribution of X, the number of trials 
required to obtain a success, for a typical value of p. ‘The basic shape of 
the probability diagram is the same whatever the particular value of p. 
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Figure 4.4 A geometric distribution 


It is clear from Figure 4.4 and formula (4.2) that, whatever the value of p, 
the first success is most likely to occur at the first trial, and it 1s more 
likely to occur at the second trial than at the third, and so on. So, for 
example, for a board game where you must roll a six to join in, you are 
more likely to join in after just one roll of the die than after two, and you 
are more likely to join in after two rolls than after three, and so on. 


The Smiths want a daughter, and so decide to continue having children 
until they have a girl: they will then consider their family to be complete. 
Suppose that the probability that each child born is a girl is 7 and that 
the sex of each child does not depend on the sex of previous children. If we 
regard having a girl as a ‘success’, then the geometric distribution given in 
formula (4.2) may be used to model the size of the family they may 


ultimately have. 

(a) What size family are the Smiths most likely to have? 

(b) What is the probability that they have only one child? 

(c) What is the probability that they have exactly three children? 
(d) What is the probability that they have exactly six children? 


Comment 


A solution is given on page 61. 
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In Activities 4.2 and 4.3, you obtained answers to the first three questions 
that were posed at the beginning of this section: Questions 1, 2 and 3 on 
page 37. And in Activity 4.5 you answered some similar questions about 
the likely family size of a couple who continue their family until they have 
a daughter. 


We shall now turn our attention to Question 4: how likely is it that you 
will still be waiting to join in a board game after five rolls of the die, or 
after ten rolls, or even after twenty rolls? No new ideas are needed to 
tackle this question; you just need to think carefully about the question 
being asked. 


Consider first the probability that you will still be waiting to join in the 
game after five rolls of the die. This is just the probability that you fail to 
roll a six on each of the first five rolls so, by the multiplication rule (3.3), it 
is equal to 


5x Bx Sx 8x 8 = (8) ~ 0.402. 
Notice that you can also think of this as the probability that you will need 
more than five rolls of the die to obtain a six, since you will need more 
than five rolls if none of the first five rolls results in a six, and vice versa; 


the two events are equivalent. So, if X is the number of rolls of the die 
needed to obtain a six, then we can write 


PX Sa 


that is, the probability that more than five rolls are needed to obtain a six 


is ey 


(a) Find the probability that you will still not have scored a six after ten 
rolls of the die, that is, find P(X > 10), the probability that you will 
need more than ten rolls of the die to obtain a six. 


(b) Find the probability that you will need at most ten rolls of the die to 
obtain a Six. 


(c) Find the probability that you will need more than twenty rolls of the 
die to obtain a six, that is, work out P(X > 20). 


Comment 


A solution is given on page 61. 


The Smiths decided to continue having children until they had a daughter 
(see Activity 4.5). 


(a) What is the probability that they will have more than four children? 
(b) What is the probability that they will have four or fewer children? 


Comment 


A solution is given on page 62. 


4.2 How long is an average wait? 


The final question posed at the beginning of this section (page 37) asked 
‘How many times, on average, will you have to roll the die to obtain a 
six?’, or equivalently ‘How long will you have to wait, on average, to join 
in the board game?’. This is one of the questions that you were invited to 
explore in the computer section. In Activity 2.6 in Computer Book D, you 
ran a series of simulations, each involving 300 waits. Part of the output of 
each simulation was the average length of the waits. Our ten simulations 
produced the following average waits. 


62 -6:076:0.-6.3> 5:66.14: °6.3 6:4 6:65.38 


The average wait varied from one simulation to another because of the 
nature of the model: it is a model for the uncertainty involved in folling a 
die. So how long should you expect to wait, on average, to join in the 
game? What does the model predict for the average wait? The ten values 
above are all estimates of this mean wait. Some of these values are less 
than 6 and some are greater than 6, but it looks as though the mean wait 
predicted by the model should be somewhere close to 6. But how can we 
find its exact value? 


In Subsection 1.2, the probability that an event occurs was defined to be 
the proportion of the time ‘in the long run’ that the event occurs. This 
suggests a possible definition for the mean wait predicted by the model: it 
is the long-run average wait. If we simulate a long sequence of waits and 
calculate the average wait as each wait is simulated, then these averages 
should settle down to the mean wait that we are seeking. 


Suppose that in a simulation of a long sequence of waits, a wait of length 1 
occurs f, times, a wait of length 2 occurs f2 times, and so on. Then the 
mean wait is given by 


1 
average wait = — (1 x f,; +2 x fo+3x fg+--:), 
n 


where n is the total number of waits in the sequence (that is, 
n=fhth+f+-). 


We can rewrite this as 


hi +2 x fa 
n n 
The mean wait predicted by the model is the long-run value of this average 
wait. However, f,/n is the proportion of waits that are of length 1, and for 
a long sequence of waits this proportion will be approximately equal to the 
probability that one roll is required to obtain a six; that is, P(X = 1). 
Similarly, f2/n will be approximately equal to P(X = 2), and so on. 
Hence, as we take longer and longer sequences of waits, the average wait 
will settle down to 


1x P(X =1)4+2 x PX =2)4+3 x P(X =3) +.-:- 


average wait = 1 x +3x—+° °°. 


= 
Tr 


So we have the result 
mean wait = = x PA = 3 (4.3) 
tack 


That is, the mean wait is equal to the sum of the products j x P(X = j). 


The mean % of a sample of n 
observations is given by the 
formula 


1 
i 7 (aif + x2 fo+ 
os + tr fx), 


where 21,22,...,2, are the 
different values observed in 
the sample, and fi, fo,..., fx 
are their frequencies. 


The letter yz is pronounced 


‘ar’. 


This result is derived in an 
appendix to this chapter. 


The mean predicted by the model is sometimes referred to as the mean of 
the probability distribution or the mean of the random variable X, and is 
denoted by the Greek lower-case letter w. The mean of a probability 
distribution is an important idea in probability and statistics, and one to 
which we shall return in Chapter D2. 


In general, the mean p of a random variable X is defined to be 


pep Tk Pe= 7) (4.4) 


where the summation is over all values 7 which X can take (that is, for 
which P(X = 7) > 0). You will not be expected to calculate means of 
probability distributions. In this block, we are interested in the results 
themselves rather than in the algebra involved in their calculation, so we 
shall not take you through the details. 


You may already have an idea about a formula for the mean of a geometric 
distribution. In Activity 2.7 in Computer Book D, you were invited to 
investigate the mean wait by finding the average wait for a number of 
simulations. For Wazting for a six, p, the probability of success at each 
trial — that is, of obtaining a six with each roll of the die — is zi We have 
already observed from the results of some simulations that it looks as 
though the mean wait is about 6. In Activity 2.7, you were also asked to 
investigate the mean waits for other values of p: p = >, p= = p = 0.4 and 
some values of your own choosing. No doubt you discovered that for p = : 
the mean wait is about 2, for p = the mean wait is about 5, and for 

p = 0.4 the mean wait is about 2.5. This suggests that, in general, the 


mean wait is 1/p. 


Using the definition of the mean wait (4.3) and result (4.2), which gives the 
probability P(X = 7) for a geometric distribution, it can be shown that the 
mean wait is indeed equal to 1/p. This result is stated in the box below. 


The mean of a geometric distribution 


If a sequence of trials is carried out and the probability of success in 
each trial is p, then the mean number of trials required to obtain a 
success is 1/p. 


Use this result to answer the questions in the next activity. 


(a) How many times, on average, will you have to roll a die to obtain a six? 


(b) What size family, on average, will couples like the Smiths have in order 
to get the daughter they long for? (See Activity 4.5.) 


(c) Tom hits the bull’s-eye on a darts board on roughly < of his attempts. 
How many darts does he need to throw on average to hit the 
bull’s-eye? 

Comment 


A solution is given on page 62. 


A lot of terminology concerning probabilities has been introduced in this 
section — for example, random variable, probability function, probability 
distribution and the mean of a probability distribution. Some notation has 
also been introduced — for example, P(X = j) for the probability that a 
random variable X takes the value j, and py for the mean of a probability 
distribution. Becoming familiar with the language and notation of 
probability is an important step towards understanding and applying ideas 
in probability and statistics. In the final section of this chapter, you will be 
using some of the ideas and notation from this section to tackle two more 
of the problems from Subsection 1.3. You will find the solution to the first 
of these problems much easier to follow if you are comfortable with. the 
terms and notation introduced in this section. So, before moving on, spend 
some time now clarifying and consolidating your. work on this section. 


Summarise in your own words what you understand by the terms random 
variable, probability function and mean of a probability distribution 
(perhaps including an example if you find it helpful). What is a geometric 
distribution, and for what sort of problems is it useful? Also make a note of 
any new notation from this section, and of how you are going to read it to 
yourself when you come across it in the text. Remember that it is a good 
idea to use words that convey the meaning of the expressions, rather than 
just the names of the symbols. Look at one or two of the more complicated 
mathematical expressions (such as those in (4.2), (4.3) and (4.4)). Do the 
words you use as you read them help you to understand what they mean? 


How would you explain these terms to a non-mathematician? How could 
you help him or her begin to appreciate their meaning? What questions 
might you pose? Learning File Sheet 3 offers a possible framework. 


Summary of Section 4 


In this section, the problems Waiting for a siz and Waiting for a girl have 
been tackled. In the course of investigating these problems, the ideas of a 
random variable and of a probability distribution and its mean have been 
introduced. These ideas will be revisited in Chapter D2. A probability 
distribution called the geometric distribution has been discussed; this 1s 
the probability distribution of the number of trials of an experiment 
needed to achieve a success. A number of results were derived, and a 
formula was given for the mean number of trials needed to achieve a 
success. We shall make use of this result in the next section to tackle one 
of the two problems still outstanding from Subsection 1.3. 


Exercises for Section 4 


Exercise 4.1 Bernoulli’s families 


Nicholas Bernoulli suggested that the sex of a child at birth could be 
modelled by rolling a die with 35 faces, 18 faces representing a boy and the 
other 17 a girl. 


(a) According to this model, what is the probability that a couple such as 
the Smiths will have four children? (See Activity 4.5, Waiting for a 
girl.) 

(b) What is the probability that they will have more than four children? 


(c) What is the average family size of couples like the Smiths who 
continue having children until they have a daughter? 


Exercise 4.2 Waiting for the jackpot 


In Activity 1.4, you found that if you buy one ticket in the British National 


Lottery, then the probability of winning a share of the jackpot is EET 


(a) Approximately how many years on average do people who buy one 
ticket a week have to wait to win a share of the jackpot? 


(b) If you buy one ticket a week, what is the approximate probability that 
you will not yet have won a share of the jackpot after 50 years? 


We have now obtained solutions to all but two of the problems described 
in Subsection 1.3. In this section, the remaining two problems — Collecting 
a complete set of musicians and Coinciding birthdays — will be tackled 
using some of the ideas and results of the previous sections. You should 
find that working through these problems will help you to consolidate your 
understanding of the main ideas and results of this chapter. You may also 
find the answers interesting and surprising! As well as consolidating your 
understanding of some of the ideas and results, the Learning File activity 
for this section asks you to summarise your ideas about discussion work 
and consider its value as part of learning. 


5.1 Collecting a complete set of musicians 


A cereal manufacturer is giving away a toy musician in each packet of a 
certain popular breakfast cereal. There are eight different musicians, but 
there is no way of knowing which musician is inside any particular packet 
without opening it. The question here is: ‘How many packets, on average, 
will you have to buy to collect a complete set of musicians?’ 


s 


In Activity 2.8 in Computer Book D, you simulated collecting a set of 
musicians, using the computer. The simulation is based on the assumption 
that each packet is equally likely to contain any one of the eight different 
musicians available. What assumptions are we making about the 
distribution of musicians in the packets of cereal by using the simulation? 


Comment 


One assumption is that there are equal numbers of the eight musicians 
available. A second is that no musicians either predominate or are missing 
from consignments delivered to a particular shop or a particular area. 


Figure 5.1 shows the results obtained for one such simulation carried out 
by a course team member. The musicians are numbered from | to 8. 


Frequency 
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Musician number 


Figure 5.1 The results of one simulation (number of packets = 29) 


In this particular simulation, by the time musician number 4 was obtained 
(the last one needed to complete the set), there were two or more of each 
of the other seven musicians and no fewer than eight of musician 

number 7. Altogether, 29 packets were required to complete the set. That 
seems a lot! Is it typical, or was it simply a run of bad luck? Did you 
obtain similar results? When the course team member ran the simulation 
a further nine times, the following numbers of packets were obtained. 
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The average of the values obtained from these ten simulations is 28.7, so 
judged on the basis of this evidence, 29 is not an unusually large number of 
packets. This average, 28.7, is one estimate of the average number of 
packets required to obtain a complete set of eight musicians. 


_ Activity 5.2 Estimating the mean 


Looking at all the results given above, do you think an estimate of 28.7 for 
the mean number of packets is a good one? Do you think it is likely to be 
close to the ‘true’ mean — that is, the mean predicted by the model? Or 
might it differ quite a lot from this ‘true’ mean? 


Comment 


It is possible that 28.7 is a good estimate. However, the results were very 
variable: the smallest number of packets needed was 15, while in one 
simulation 59 were required. If only one of the values had been different, 
then the estimate could have been much larger or much smaller; for 
instance, if there had been another 15 instead of the 59, then the estimate 
would have been only 24.3; or if there had been another 59 instead of 

the 15, then the estimate would have risen to 33.1. Since the results were 
so variable, far more simulations are needed. So the estimate of 28.7 could 
be some way from the mean. 


The alternative to running simulations in order to estimate the average 
number of packets required to complete a set is to use the model itself to 
calculate the mean, that is, to use probability theory. In fact, we can find 
the mean by making use of some of the ideas from Section 4, ‘Waiting for a 
success’. 


Clearly, you will obtain your first musician from the first cereal packet you 
open. But how many more packets will you have to open to obtain a 
second musician different from the first? This is the subject of the next 
activity. 


(a) What is the probability that when you open a packet you will find a 
musician different from the first one? 


(b) How many packets, on average, will you need to open to obtain a 
musician different from the first one? 


Comment 


(a) Seven out of eight musicians are different from the first one, so the 


probability that a packet contains a different musician is t 


(b) As long as you have musicians of only one type, the next packet will 
contain a different musician with probability a So if you regard 
opening a packet and finding a second musician as a ‘success’, then 
the number of packets you will need to open to obtain a ‘Success’ 

(a second musician) has a geometric distribution. In Subsection 4.2 
(page 46), you saw that the mean of a geometric distribution is 1/p, 
where p is the probability of success in each trial. So the mean number 
of additional packets needed to obtain a second musician 1s 


ie 
So you need 1 packet to obtain your first musician, and > packets, on 


average, to obtain your second musician. Hence, on average, you will 
need to open a total of 


1+ ~ 2.14 packets 


to obtain your first two musicians. 


(a) Once you have collected two different musicians, what is the 
probability that the next packet you open will contain a musician 
different from the first two? 


(b) How many packets, on average, will you need to open to obtain your 
third different musician? 


Comment 


(a) Six of the eight musicians are different from the first two, so the 


probability of finding a different musician in the next packet is 2 


(b) Again, if obtaining a different musician is a ‘success’, then the average 
number of packets you will need to open to obtain a ‘success’ is 1/p, 
where p = P(success). So the average number of additional packets 
required to obtain a third different musician is 


if, =. 
And the total number of packets required, on average, to obtain your 
first three different musicians is 


14248 ~3.48. 


Can you see a pattern developing here? 


(a) Once you have three different musicians, how many additional packets, 
on average, will you need to open to obtain your fourth different 
musician? 

(b) Once you have four different musicians, how many additional packets, 
on average, will you need to open to obtain your fifth different 
musician? 


(c) How many additional packets, on average, will you need to open to 
obtain your sixth, seventh and eighth musicians? 


-(d) How many packets in total will you need to open, on average, to 
collect a complete set of eight musicians? 


Comment 


(a) When you have three different musicians, the probability that the next | 
packet contains a different musician is 2, so the average number of 


8 


additional packets needed to obtain a fourth musician is 1/2 = &. 


(b) Similarly, on average, you will need to open 1/4 = % additional 
packets to obtain your fifth different musician. 


(c) You will need to open 1/3 = § packets, on average, to obtain a sixth 
different musician, 1/ 7 = packets to obtain your seventh musician, 
and 1/< = = packets to obtain your eighth and final musician in the 
set. 

(d) So the mean total number of packets you will need to open to collect a 
complete set of musicians is 

oe ee ee ee es 
L+54+342457 2.2 


or approximately 22 packets. 


Compare the theoretical result above with your intuitive answer from 
Subsection 1.3 and the number you obtained using simulation in 

Activity 2.8 (in Computer Book D). Are you surprised that the mean 
number of packets needed to collect a set of size eight is as large as 22? 

Or were you expecting this sort of result or a larger number after doing the 
simulations? In what way has simulation helped you to understand what is 
involved in modelling and solving the problem? 


Comment 


Most people’s intuitive answers are numbers much smaller than 22. Was 
your value lower than this? If so, then you were probably surprised by the 
results of your simulations. However, having done the simulations, you 
were probably not very surprised by the theoretical result — not even if 
your estimate was as far from the mean as was our estimate of 28.7 (see 
page 50). 


You may also have been surprised by how greatly the number of packets 
varied from one simulation to the next. This is something you would not 
have been aware of if you had simply used theory to calculate the mean. 


The approach used to find the mean number of packets required to obtain 
a complete set of eight musicians can be used to find the average size of 
‘complete’ families. Try this in the next activity. 


Some couples do not regard their family as complete until they have at 
least one boy and one girl, and so decide to continue having children until 
they have at least one son and at least one daughter. Assuming that a boy 
and a girl are equally likely, what is the mean size of such families? How 
does this mean compare with the estimate you obtained in Activity 2.9(b) 
in Computer Book D, using the simulation software? 


Comment 


A solution is given on page 62. 


5.2 Coinciding birthdays 


In the final problem from Subsection 1.3, you were asked to guess the 
probability that at least two children in a class of 24 (no twins) will have 
the same birthday. 


This is one of the ‘at least’ problems that is most easily solved 
‘backwards’, that is, using result (3.2): 


P(E) =1-— P(not-E). 


If EF is the event that at least two children share a birthday, then ‘not-E’ is 
the event that all the children have different birthdays. It is much easier to 
calculate the probability of this event directly than to calculate the 
probability of the actual event required. 


What assumptions would you need to make in order to tackle this problem? 


Comment 


The basic assumption you need to make is that a child’s birthday is 
equally likely to fall on any day of the year. And to keep the problem 
manageable, ignore leap years. 


Having made the above assumption, we are ready to tackle the problem. 
For clarity of presentation, let us suppose that the children are listed in 

some way (in alphabetical order, or by age, or whatever). The first child 
on the list can have any day for his or her birthday — it does not matter 
which day. 


The second child must not share a birthday with the first. Counting 
outcomes, 364 days out of 365 give a different birthday, so the probability 


that the second child does not share a birthday with the first is a 


(a) If the first two children’s birthdays are on different days, what is the 
probability that the third child does not share a birthday with either 
of the first two? 


(b) If the first three children’s birthdays are on different days, what is the 
probability that the fourth child does not share a birthday with any of 
the first three? 


(c) What is the probability that none of the first four children share a 
birthday? 
Comment 


A solution for parts (a) and (b) is given on page 62. For part (c), see below. 


If we pick any four people at random, then on 364 occasions out of 365, 


the first two will have different birthdays, on 2% of these occasions the 


365 
third person’s birthday will be different from the first two, and on see of 
these occasions the fourth person will have a different birthday from any of 
the first three. So the proportion of times in the long run that all four will 
have different birthdays is 

364 363 362 


ee ee 
365 365 365 


This is the probability that all the first four children have different 
birthdays. Can you see a pattern forming? 


(a) What is the probability that the first five children all have different 
birthdays? 


(b) If the first 23 children have different birthdays, what is the probability 
that the last child’s birthday is different from the other twenty-three? 


(c) Write down an expression for the probability that the birthdays of the 
24 children are all different, and hence calculate the probability. 


(d) What is the probability that at least two of the children share a 
birthday? 


Comment 


(a) The fifth birthday will be different from the first four on 3% of 
occasions, on average, so the probability that all five birthdays are 
different is 


A 62 1 
= x = x — x — ~ 0.973. 
365-305 — 365 — 365 


(b) If the first 23 birthdays are all different, then there are 342 possible 
different days for the twenty-fourth birthday. So the probability that 
the twenty-fourth birthday is different from the first 23 is 22 


365 ° 
(c) Hence the probability that all 24 birthdays are different is 
364 363 362 361 342 


SS ee 
365. 365° 365 365. 365 


the 24th is 
different from 
the first 23 


the third is 
different from 
the first two 


the second is 
different from 
the first 


(d) So the probability that none of the children share a birthday is less 
than S, and the probability that at least two of the children share a 


birthday is 
1 — 0.462 = 0.538. 


Was your ‘guess’ anywhere near this? Or did you think the chances of 
a shared birthday were much lower? Do you find this result surprising? 
You may not find it so surprising if two people in your family (an 
aunt, a cousin, ...) share a birthday, or if two of your friends do. And 
the chances are that they do! 


To complete your work on this chapter, take some time to review your 
work on notation and discussion. 


Discussing work with others can be an extremely useful way of helping you 
to sort out ideas and clarify understanding. Often, by exploring your 
thoughts with others, presenting conjectures and working through 
problems, difficulties can be sorted out. It is important to recognise that 
discussion can be an important way to help learning and extend ideas. 


Working in teams forces you to consider how to communicate with other 
people; some you may know, others you may not know; some may be in 
the same field as yourself, others may have different expertise. Summarise 
your ideas on how you would modify or change the way you talk or discuss 
ideas depending on different groups of people. Use some of the examples 
and activities from the chapter where you can. 


Finally, try to sum up for yourself how useful you find talking with other 
people to help you to learn. Are there situations when this type of activity 
is most useful, or not at all useful? Can you think of situations when you 
find it more beneficial to work by yourself? You may wish to make notes 
on Learning File Sheet 4. 


Summary of Section 5 


In this section, the ideas and techniques discussed in the earlier sections 
have been used to tackle the questions posed in Subsection 1.3 which were 
still outstanding. We hope that working through this section has helped 
you to consolidate your understanding of the work in this chapter. 


In this chapter, you have been introduced to some basic ideas of probability 
theory. Part of your work involved examining your own intuitions about 
chance events. You were asked to use the simulations part of the statistics 
software to investigate a number of problems and to provide data on which 
to base conjectures. Probability theory was then used to tackle these and 
other problems. This involved a lot of new terminology and notation. 


Before you move on to the next chapter, it is important for you to reflect 
on what you have learned. To help you focus your attention on two 
important features of this chapter, try the two activities suggested below. 


How have you assimilated the language and ideas of probability that have 
been introduced? Has ‘reading for understanding’ helped you? What did 
you find most interesting? Which idea did you find most difficult? 


Which of the problems from Subsection 1.3 did you find most interesting? 
Were there any results which you found surprising? Were any of the 
results contrary to your intuition? If so, which were they, and do you now 
feel that you have a greater understanding of the problems and the nature 
of random events? 


Learning outcomes 


You have been working towards the following learning outcomes. 


Terms to know and use 


Outcome, equally-likely outcomes, event, independent events, 
random variable, probability function, probability distribution, 
frequency diagram, mean of a probability distribution, 
geometric distribution. 


Symbols and notation to know and use 


The notation P() for a probability. 
The symbol y for the mean of a probability distribution. 
The notation P(X = 7) for a probability involving a random variable. 


Ideas to be aware of 


© How the uncertainty in a random variable can be represented by a 
probability distribution. 


© How the mean of a probability distribution is interpreted. 


Features of the statistics software 


v7 


Run probability simulations to investigate the behaviour of a model 
for a range of situations involving uncertainty. 


Learning skills 


© 
© 


Reflect on the use of new terms in order to clarify their meaning. 


Decide what words to use when reading new notation in order to make 
its meaning clear whenever it appears. 


Investigating processes to aid understanding 


© 
© 


Use a systematic approach where possible. 


Look for a pattern in the results of calculations for special cases, and 
conjecture a general result. 


Look for a pattern in the results of simulations of special cases, and 
conjecture a general result. 


Apply general results in specific instances. 


This is formula (4.4). 


In Section 4, you saw that the mean of a probability distribution is defined 
to be 


p=) 1% x = 4). 


The probability function of a geometric distribution is given by 
formula (4.2) as 


Pix es =e es, 8 = 1-2 


So the mean of a geometric distribution is given by 
p= > Fie 
j=l 


that is, 

= p+2(1—p)p+3(1—p)*p+4(1—p)*pt+---. (A.1) 
If we multiply both sides by 1 — p, we obtain 

(1 — p)u = (1— p)p + 2(1 — p)*p + 3(1 — p)*p + 4(1 — p)*p+---. (A.2) 
Subtracting equation (A.2) from (A.1) gives 

p—(1—p)u = p+ (1—p)p+ (1—p)’p + (1-p)*p+-- 
The left-hand side of this equation is equal to 

i — pe = PM. 


The right-hand side is the sum of the probabilities P(X = 1), P(X = 2), 

P(X = 3), P(X =4), ... for a geometric distribution. And since X must 
take one or other of the values 1, 2,3,4,..., the sum of these probabilities 
is equal to 1. Therefore we have 


pu = 1, 


and hence 


a, 
Y 


as required. 


Solution 1.3 


(a) All the 52 cards are equally likely to be at the 
top of the pack, so 


P(ace of spades) = 3. 

(b) There are 4 aces in a pack of 52 cards, so, on 
average, we would expect an ace to be at the top 
of the pack 4 times out of 52. That is, 


P(ace) = 5 =H. 


(c) There are 13 hearts in a pack of 52 cards, so 


P(heart) = 3 = §. 


Solution 1.4 


(a) Each ticket is equally likely to win, and there are 
a million tickets. 


(i) The probability of winning if you buy one 
° . 1 
ticket is [000 000° 
(ii) The probability of winning if you buy ten 
1 


: : 10 ae 
tickets is 7559000 = 700000" 


(b) Each of the 13983816 selections is equally likely 
to occur. 


(i) The probability of winning a share of the 


° . : . 1 
jackpot with one selection is 739s3g7- 


(ii) The probability of winning with ten 
. e e 10 

different selections is ace 

(iii) The probability of winning with 100 

different selections is FEC IST ~ 0.000 007. 


Even with 100 different selections, the 
probability of winning is extremely small. 


Solution 3.2 


An argument identical to that used when the side 
showing is red can be used when the side showing is 
white. There are three white sides; two of these have 
a white side on the other side of the card, so the 
probability that the other side is white is =. It is not 
a fair bet: the entertainer will win in the long run on 


this bet too. 


Solution 3.4 
(a) There are eight possible equally-likely outcomes: 


hhh, hht, hth, htt, thh, tht, tth, ttt. 


Notice the systematic way that the possible 
outcomes of three tosses of a coin have been 
written down: each of the first four outcomes is 
h followed by one of the four possible outcomes 
of two tosses; and each of the second four 
outcomes is t followed by one of the four possible 
outcomes of two tosses. 


(b) (i) The probability of no heads is 


P(no heads) = ;. 


(ii) Seven of the eight possible outcomes include 
at least one head, so 


P(at least one head) = £. 


(iii) Four of the eight possible outcomes include 
at least two heads, so 


P(at least two heads) = 2 = 5. 


(iv) Three of the outcomes contain exactly two 


heads, so 
P(two heads) = 3. 
Solution 3.5 


(a) (i) The four possible family patterns are 
GG, GB, BG, BB. 


(ii) Two of these patterns contain one girl and 
one boy, so the probability of one girl and one 


boy in a family of two children is 5 = S. 


(b) (i) There are sixteen possible family patterns 
for families of size four: 


“GGCE. GOGEB. €GBG, GCBB, 
CBGG:“GBGB- ‘GRBG,--GBBB, 
BGGG;..BGGB, -BGBG, BGBB. 
BBGG, BBGB, .BBBG, BBBB. 


(ii) Six of these patterns contain two girls and 
two boys, so the probability that the Johnsons 
will achieve their ideal family is = = 3 

(c) The Watsons are right about their chances 
(although their reasoning is dubious). However, 
the Johnsons are not right: the probability of 
obtaining their ideal family is only 3 not 5 as 
they suppose. 


Solution 3.7 
(a) 


Although there are only 6 partitions of 9, they 
are not all equally likely. Some of them can 
occur in more than one way; for example, 522 is 
not the same outcome as 252 or 225. It is 
important to distinguish between the three dice, 
for instance by imagining that they are different 
colours, and by always recording the score on 
one particular die first. 


The possible equally-likely outcomes which give 
a total of 9 are as follows. 


Partitions Outcomes 

(621) 621. 612° - 9615 216"=462>-126 
(531) 531: 813 SSP o315= 5 
(522) S22 252205 

(441) 441 414 144 

(432) 432. 423-242 324-7 Oe 
(333) 333 


There are 25 possible outcomes giving a total of 
9, so the probability of obtaining a total score of 


9 with three dice is oes ~ 0.116. 


There are 27 possible equally-likely outcomes 
giving a total of 10. These are listed below. 


Partitions Outcomes 

(631) 631 613 266 316 363 -536 
(622) 622: 262°: 226 

(541) 541 514 451 415 1542245 
(532) Soe O20 2 GUL Soed 2 oo 
(442) 442 424 244 

(433) 433. 343° 394 


So the probability of obtaining a total score of 


10 with three dice is Se =< 4.175. 


Solution 3.8 
(a) 


The probability that each child is a girl is 5 


. independently of whether the other children are 


girls, so the probability that the first three 
children are all girls is 


P(GGG) = 5 x 5X 5 = § = 0.125 
P(at least one boy) = 1 — P(no boys) 
= 1 — P(all girls) 
—— : 
= £ = 0:875 


(i) The probability that fifteen children are all 
boys is 

eo. 
(a) = 30768" 
(ii) Similarly, the probability that fifteen 
children are all girls is 


3)" == 


2 


32768. 


So the probability that fifteen children are all the 
same sex, that is, either all boys or all girls, is 
i : A ee 
32768 32768 32768 16384 


Solution 3.9 


(a) 


(b) 


(c) 


If 17 out of 35 faces correspond to a girl, then, 
assuming that all 35 faces are equally likely to 
come up, the probability of a girl is a. 


The probability that three children are all girls is — 


P(all girls) = (22)° ~ 0.115. 
P(at least one boy) = 1 — P(no boys) 
= 1=—0.115 


= 0.885 


Solution 3.10 


(a) 


The total score is 3 if the score on each die is 1. 
The probability that the score on all three dice 
is l is 


1 
ee 
(i) P(total score of 3 twice) = ba Sake = 


216 
ae 


| 
Parnes 
a 
bo 
(op) 
Ne al 
WwW 


(ii) P(total score of 3 three times) 
~902% io 


(iii) P(total score of 3 four times) = (ie 
e459 x 16 


These probabilities are very small indeed. 
Cardano was justified in regarding the 
occurrence of a total score of 3 three or four 
times in a row as worthy of suspicion. 


Solution 3.11 


(a) 


The probability that the score on a single roll of 
a die is not 6 is 2, and the scores on two rolls of 
a die are independent, so, by the multiplication 
rule, the probability that neither score is a 6 in 


two rolls is 


The probability of no sixes in three rolls of a die 
is 


a ae See 
ee ee oe 


The probability of a double-six when two dice 
are rolled is the probability that both dice come 
up 6, that is, 


oS See 
oe" 6 Se" 

The probability of two double-sixes in two rolls 
of a pair of dice is 


1 oo 
36 * 36 — 1296" 


Solution 3.12 


(a) From part (c) of Activity 3.11, the probability of 
obtaining a double-six is ie so, using (3.2), the 
probability of failing to get a double-six is 

{cod = 38 


— 8 
(b) The probability of failing to get any double-sixes 
in 24 rolls of a pair of dice is, using the 
multiplication rule (3.3), 


(38)** ~ 0.509. 


(c) Hence, using (3.2), the probability of getting at 
least one double-six in 24 rolls of a pair of dice is 


1 — P(no double-sixes) = = ay ~ 0.491. 


The Chevalier was correct to suspect that the 
second wager was not a good one. (He must 
have gambled a lot to detect such a small 
difference in probabilities!) 


Solution 4.3 


(a) It is clear from Figure 4.2 that the most likely 
number of rolls needed to obtain a six is 1; and 
we have already calculated its probability in 
Activity 4.2: 

PX S=D== 

(b) The probability that exactly five rolls are needed 

is, using formula (4.1) with 7 = 5, 


P(X =5) = (2) x 4 ~ 0.080. 


The probability that exactly ten rolls are needed 
is 


P(X =10) = (8) x 4 ~ 0.082. 


You are more likely to obtain your first six on 
your fifth roll than on your tenth roll. 


Solution 4.4 


(a) (i) The probability that the first trial results in 
a success 1s p, SO 


PX’ =) == 


(ii) The probability that the first trial is a 
failure is 1 — p. | 


(b) P(X = 2) = P(failure) x P(success) = (1 — p)p 


(c) P(X =3) = P(failure) x P(failure) x P(success) 
=(1—p)’p 


(d) Continuing the pattern, we obtain the following 
expression for P(X = 9): 


followed by 


a SUCCESS 


P(failure) x --- 
= {1 =p) p. 


x P(failure) x P(success) 


Solution 4.5 : 


(a) The size of such a family has a geometric 
distribution, so the most likely size is 1. (In this 
case, p = P(success) = 3.) 


(b) If X is the number of children they have, then 
the probability that they will have only one 
child is 

P(X = 1) Sy: 

(c) The probability that they will have three 

children is 


P(X =3)= (1 —p)’p = (1) xi =}. 


(d) The probability that they will have six children 
is 


P(X = 6) = (1—p)®p = (4) x 4 = &. 


Solution 4.6 


(a) The probability that more than ten rolls are 
needed to obtain a six is just the probability 
that none of the first ten rolls is a six: 


P(X > 10) = (2) ~ 0.162. 


(b) The probability that at most ten rolls are 
needed is, using result (3.2), 


P(X < 10) =1—P(X > 10) 
~ 1— 0.162 = 0.838. 


(c) The probability that more than twenty rolls of 
the die are needed is 


P(X > 20) = (2) ~ 0.026. 


Solution 4.7 


(a) The probability that the Smiths will have more 
than four children is just the probability that 
the first four children are all boys, that is, 


P(X > 4) =(1)° = 4. 


(b) The probability that they will have four children 
or fewer is, using result (3.2), 


P(X <4) =1—- P(X >4) = 2. 


Solution 4.8 


(a) Since p = é, the average (that is, the mean) 
number of rolls of a die needed to obtain a six is 
1/4 =6. 


(b) Since p = 5, the average size of family for 
couples like the Smiths is 1/5 = 2. 


(c) Since p= 2, the average number of darts needed 
to hit the bull’s-eye is 1/2 oe 3 = 4z. 


Note that a mean does not have to be a whole 
number. This result just says that the average 
number of darts Tom needs to throw to hit the 
bull’s-eye is between 4 and 5. 


Solution 5.7 


This problem is equivalent to collecting a complete 
set of size 2, the two ‘objects’ being a girl and a boy. 


The first child is certain to be either a girl or a boy. 

Thereafter, the probability that the next child is of 

the other sex is 5, so the mean number of additional 
children needed to obtain a child of the other sex is 

1/5 =2. 


Hence the mean total number of children needed to 
obtain at least one child of each sex is 1 + 2 = 3. 
The mean size of ‘completed’ families is 3. 


A course team member carried out 100 simulations. 
The sizes of the ‘completed’ families varied between 2 
and 8, and the average family size was 3.03, quite 
close to the theoretical mean family size. 


Solution 5.9 


(a) There are 363 possible different days for the 
third child’s birthday, so the probability that it 
is on a different day from the first two children’s 


birthdays is ane 


(b) There are 362 possible different days for the 
fourth child’s birthday, so the probability that it 
is on a different day from the first three 


= - 362 
children’s birthdays is 3Fe. 


Solution 3.1 


(a) There are two tickets ending in either 0 or 5 in 
each group of ten consecutively numbered — 
tickets. Since there are 50 groups of ten tickets 
in the 500 tickets, there are 50 x 2 winning 
tickets; that is, there are 100 tickets with a 
number ending in 0 or 5. 


(b) Using formula (3.1), 


P(win) = 3 =. 


Solution 3.2 


(a) There are 4 possible outcomes for the score on 
the first die and 4 possible outcomes for the 
score on the second die. So there are 4 x 4= 16 
possible outcomes when the two dice are rolled 
together. A total of 5 can be obtained in four 
ways: 


1 on the first die, 4 on the second die; 
2 on the first die, 3 on the second die; 
3 on the first die, 2 on the second die; 
4 on the first die, 1 on the second die. 


So, using formula (3.1), 


P(total of 5) = 4 = §. 


(b) The probability that the first die lands on a 2 is 
5, and the probability that the second die lands 
on an odd number is : = 5: The scores on the 
two dice are independent, so, using the 
multiplication rule (3.3), the probability that the 
first die lands on a 2 and the second die lands on 


an odd number is 


Solution 3.3 


(a) The probability that the score on a single roll of 
a tetrahedral die is 4 is 5; so, by the 
multiplication rule (3.3), the probability that 

1 


both dice land on 4 is ; % ; = iG: 
(b) Using (3.2), the probability of failing to obtain a 
double-four when the two dice are rolled is 


1 — P(double-four) = 1- 4 = 2. 


(c) The probability of failing to get any double-fours 
in six rolls of a pair of tetrahedral dice is, using 
the multiplication rule (3.3), 


(18)° ~ 0.679. 


(d) Hence, using (3.2), the probability of obtaining 
at least one double-four in six rolls of a pair of 
tetrahedral dice is 

1 — P(no double-fours) = 1 — eg 
~ 1-— 0.679 


= 9.321. 


Solution 4.1 


The probability that each child born is a girl is a 
17 


a5 


se) 
p = P(success) = 
(a) If X is the number of children that a couple such 
as the Smiths have, then the probability that 
they have four children is 
P(X =4) = (1—p)p = (38)" x 2 ~ 0.066. 


(b) The probability that they have more than four 
children, P(X > 4), is given by 


P(first four children are boys) = es ~ 0.070. 


(c) The average family size for such couples is 


2 ag 
2. ee 
That is, the average number of children in such 
families is just over 2. (Recall that the mean 


does not have to be a whole number.) 


Solution 4.2 


(a) The probability of a ‘success’ in a single week is 
ps SCTCTTe so the average time people have to 
wait to win a share of the jackpot is 


1 
— = 13983 816 weeks ~ 13983 816/52 years 
Pp 

~ 269 000 years, 


that is, over a quarter of a million years! 


(b) The number of weeks in 50 years is 
approximately 50 x 52 = 2600, so we require the 
probability of failing to win a share of the 
jackpot for 2600 weeks in a row. This is 


13983 815 ae as 
13 983 816 Sei 


so you almost certainly will not win, even if you 
make a selection every week for 50 years! 
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